Prove me wrong, Rook Part 1

Readers familiar to this blog know that I’ve been working on a model to predict success in the NBA using the Wins Produced metric (See the Basics here). In a sense, it’s the mission statement of this blog. The intent is to shake out the tools and build a model piece by pace, put it through it’s paces, rinse and repeat and over time get closer to simulating the truth.

 

I'm not quite trying to build a universe here (Image courtesy of xkcd.com)

 

The development version is already up and released to the public for beta testing (see here) and the full pre-season build is coming (and endless refinements as the season goes along) but before I get to that I need to deal with one of my favorite topics: the draft and rookies.

 

Sure there are better rookie images but Frank Quietly is awesome

 

Now the draft is notoriously hard to model and a simple answer would be to just use some dummy variables for rookies and carry on but readers by now know that I never take the easy path. So the question becomes how do we model rookies?

For this exercise, I went ahead and did a full build combining all the combine data from Draft Express (yes all of it, I have been working on this for a while) with all the WP48 data for rookies. Then I took the data and started looking for variables that correlate to rookie year Raw Productivity  per 48 minutes (ADJP48) . Please note that I said rookie year and not 1st 4 years that is a slightly different model (and post :-)). I found the following variables that correlate in  a meaningful way:

  • Height
  • Position
  • Age when drafted
  • Win Score per 40 minutes

The equation I came up with based on these variables is:

ADJP48 = K – A* HEIGHT + B* SIMPOS – C* DFTAGE + D* WS40

Were K,A,B,C,D are constant

With a correlation of 42% for every player that played more than 400 minutes as rookies coming from college (from 1996 to 2010 that’s 373 players). In Graph form it looks something like this:

The full table is here. But what does it actually mean? When I look at the error by Age and Position I see the following:

The model is consistent and it’ll allow me to look at a player and predict within reason who they’re going to be. Given that I only care about one side of the tail (i.e. if my model oversells a player (false positives)  it costs me  money, if it undersells him (false negatives) its money in my pocket) the model is better than the straight correlation indicates.

Let’s illustrate. Here’s the best ranked rookies who actually played from 1997 thru 2006 (the last ten year period where the draftees have at least 4 years of data):

If I consider a hit drafting a player who is at least a career .090 WP48 player then the model hit 36 of 50 times for 72%. So if I have multiple picks in a draft, I’m assured a decent player and since the average pick for the group is 13 these players will be available late. As for the last few years here are the recommended picks:

You’ll note that Blake Griffin isn’t in this group (hasn’t played yet) but overall the list is strong. Beasley is the turd in the punch bowl but I would remind everyone that he’ s only played two years in the league (and this might be by his own admission the first year he plays clean).

As for the misses?

Missing Lee  and Odom hurts but it’ll have to do until we build a better college model.

So now that we have the model the next logical step is to project the incoming 2010 rookie class and I’ll do just that. Tomorrow. In part 2.

Part 2 is here

23 Comments

  1. 10/8/2010
    Reply

    Hey Arturo – It looks like you use simple position as a continuous variable here. Do you do better if you make it categorical? I assume the jumps in productivity aren’t the same from point to SG to SF to PF to center.

    • 10/8/2010
      Reply

      Very probably. Got to leave some improvement for the next version. I’ll play with running a by position regression equation..

  2. jglanton
    10/8/2010
    Reply

    Arturo,
    The first the that came to mind when you used ‘height’ in the formula was to refine it to use ‘reach’. It might help remove some anomalies to separate the pterodactyls from the T-Rexes, as some of the pterodactyls overachieve for their height, and vice-versa.

    • 10/8/2010
      Reply

      We looked at reach as one of the variables and it didn’t really correlate strongly. The combine data is actually a big waste of time so far. So far the only questions that matter are:
      Can you play?
      What position?
      Are you tall?
      How old are you?
      Everything else resembled noise. I will however revisit the combine data and the can you play question in the future.

  3. Neal Frazier
    10/9/2010
    Reply

    When looking at the age of the draftee, is the problem with younger players more that they aren’t mature enough to compete with men yet or is it that we haven’t seen them enough to figure out how good they will be yet? Not sure how you would tease this out in the numbers…

    • 10/9/2010
      Reply

      Actually, the model favors younger players. If you have to players with similar numbers go younger.

  4. Shawn Ryan
    10/9/2010
    Reply

    Damn Arturo! I want to be just like you when I grow up!

  5. Fred Bush
    10/10/2010
    Reply

    So, height is bad? Am I misreading your equation or are you burying the lede?

    • 10/10/2010
      Reply

      It’s a combined effect. College performance is devalued by height and increases with youth. So the performance number is more likely to correlate if you’re shorter and younger. So a 19 year old 6’6” center who lit it up is more likely to have success. If you’re tall and old you have to dominate in college to dominate in the pros.

  6. Fred Bush
    10/10/2010
    Reply

    If that’s true, I’m going to guess that’s a highly exploitable flaw in teams’ valuations of players. I would assume that most teams think that, all things being equal, a taller player would be better. How much of the difference between actual draft position and your algorithm’s draft position is explained by that single variable being (-) rather than (+)?

    • 10/10/2010
      Reply

      Though to tell but it’s significant. I’ll run some numbers. The point is height should lead to production or it’s worthless.

  7. Evanz
    10/11/2010
    Reply

    I see Horford and Speights on the list. Was Noah a miss?

Leave a Reply

Your email address will not be published. Required fields are marked *