Readers familiar to this blog know that I’ve been working on a model to predict success in the NBA using the Wins Produced metric (See the Basics here). In a sense, it’s the mission statement of this blog. The intent is to shake out the tools and build a model piece by pace, put it through it’s paces, rinse and repeat and over time get closer to simulating the truth.

The development version is already up and released to the public for beta testing (see here) and the full pre-season build is coming (and endless refinements as the season goes along) but before I get to that I need to deal with one of my favorite topics: the draft and rookies.

Now the draft is notoriously hard to model and a simple answer would be to just use some dummy variables for rookies and carry on but readers by now know that I never take the easy path. So the question becomes how do we model rookies?

For this exercise, I went ahead and did a full build combining all the combine data from Draft Express (yes all of it, I have been working on this for a while) with all the WP48 data for rookies. Then I took the data and started looking for variables that correlate to rookie year Raw Productivity per 48 minutes (ADJP48) . Please note that I said rookie year and not 1st 4 years that is a slightly different model (and post :-)). I found the following variables that correlate in a meaningful way:

- Height
- Position
- Age when drafted
- Win Score per 40 minutes

The equation I came up with based on these variables is:

ADJP48 = K – A* HEIGHT + B* SIMPOS – C* DFTAGE + D* WS40

Were K,A,B,C,D are constant

With a correlation of 42% for every player that played more than 400 minutes as rookies coming from college (from 1996 to 2010 that’s 373 players). In Graph form it looks something like this:

The full table is here. But what does it actually mean? When I look at the error by Age and Position I see the following:

The model is consistent and it’ll allow me to look at a player and predict within reason who they’re going to be. Given that I only care about one side of the tail (i.e. if my model oversells a player (false positives) it costs me money, if it undersells him (false negatives) its money in my pocket) the model is better than the straight correlation indicates.

Let’s illustrate. Here’s the best ranked rookies who actually played from 1997 thru 2006 (the last ten year period where the draftees have at least 4 years of data):

If I consider a hit drafting a player who is at least a career .090 WP48 player then the model hit 36 of 50 times for 72%. So if I have multiple picks in a draft, I’m assured a decent player and since the average pick for the group is 13 these players will be available late. As for the last few years here are the recommended picks:

You’ll note that Blake Griffin isn’t in this group (hasn’t played yet) but overall the list is strong. Beasley is the turd in the punch bowl but I would remind everyone that he’ s only played two years in the league (and this might be by his own admission the first year he plays clean).

As for the misses?

Missing Lee and Odom hurts but it’ll have to do until we build a better college model.

So now that we have the model the next logical step is to project the incoming 2010 rookie class and I’ll do just that. Tomorrow. In part 2.

Hey Arturo – It looks like you use simple position as a continuous variable here. Do you do better if you make it categorical? I assume the jumps in productivity aren’t the same from point to SG to SF to PF to center.

Very probably. Got to leave some improvement for the next version. I’ll play with running a by position regression equation..

Arturo,

The first the that came to mind when you used ‘height’ in the formula was to refine it to use ‘reach’. It might help remove some anomalies to separate the pterodactyls from the T-Rexes, as some of the pterodactyls overachieve for their height, and vice-versa.

We looked at reach as one of the variables and it didn’t really correlate strongly. The combine data is actually a big waste of time so far. So far the only questions that matter are:

Can you play?

What position?

Are you tall?

How old are you?

Everything else resembled noise. I will however revisit the combine data and the can you play question in the future.

[…] that I want to talk up one of the show’s hosts Arturo. He recently released a draft model. In it he shows he hit 36/50 times on players he predicted to be good and dings himself for missing […]

When looking at the age of the draftee, is the problem with younger players more that they aren’t mature enough to compete with men yet or is it that we haven’t seen them enough to figure out how good they will be yet? Not sure how you would tease this out in the numbers…

Actually, the model favors younger players. If you have to players with similar numbers go younger.

Damn Arturo! I want to be just like you when I grow up!

Thanks. Just wait till the sequel! 🙂

[…] I unveiled one of those, my rookie model and it was deemed awesome. Again, Frank Quietly = […]

So, height is bad? Am I misreading your equation or are you burying the lede?

It’s a combined effect. College performance is devalued by height and increases with youth. So the performance number is more likely to correlate if you’re shorter and younger. So a 19 year old 6’6” center who lit it up is more likely to have success. If you’re tall and old you have to dominate in college to dominate in the pros.

If that’s true, I’m going to guess that’s a highly exploitable flaw in teams’ valuations of players. I would assume that most teams think that, all things being equal, a taller player would be better. How much of the difference between actual draft position and your algorithm’s draft position is explained by that single variable being (-) rather than (+)?

Though to tell but it’s significant. I’ll run some numbers. The point is height should lead to production or it’s worthless.

[…] performance (Yogi and Boo Boo). For the math behind it see the Basics . For the model build see parts 1 & part 2. Now, we get to the payoff where we feed the numbers for the 2010 rookies into the […]

I see Horford and Speights on the list. Was Noah a miss?

Yogi missed him, Boo Boo got him (see part 2)

[…] Stats” – has offered a few studies of rookies recently (see his “Prove me Wrong” series HERE, HERE, and HERE). His latest – reposted below – is a quick look at the preseason rookie […]

[…] built two models to predict the future performance of NBA draft picks (go here for the model build parts 1 & part 2 ). In very general terms the models use the available data to predict future […]

[…] future performance of NBA draft picks (go here for the model build parts 1 & part 2 ). In very general terms the models use the available […]

[…] built two models to predict the future performance of NBA draft picks (go here for the model build parts 1 & part 2 ). In very general terms the models use the available data to predict future […]

[…] built two models to predict the future performance of NBA draft picks (go here for the model build parts 1 & part 2 ). In very general terms, the models use the available data to predict future […]

[…] original build in detail is achived here (parts 1 & part 2 ). In very general terms, the models use the available data to predict future […]