Age & Productivity Model for the NBA Revisited

I put up a post earlier today on the 4 and a half team trade that took place today. Reader Tom Mandel asks:

I agree with your analysis — hard not to. Also, as usual, your framing is witty and entertaining. I love that freechrispaul.com is still available!

Yet, can you tell me — methodologically — why it’s preferable to use Ariza’s ’09-10 WP48 rather than his much better ’08-9 WP48 in analyzing his value in the trade?

Obviously, it’s a more recent number — but is that fact a defensible methodological justification? After all, the previous two years provide more minutes, i.e. a better dataset.

Am I wrong to say that the value of your analysis is virtually *entirely* in the data you use? And that the rest is not much more than arithmetic?

Hence, how much confidence do you have that the data you use is the right data to assess this trade? And what justifies that confidence?

I was going to respond to this in the comments but once I realized I’d put in regression analysis and graphs I decided  that we were all better served by having it as a post on its own.

The analysis is  a little bit more complicated than just picking out his last years WP48. I’m going to start with age and ADJP48 (Raw Productivity). I’d published a post on the age model before and we will revisit some of those topics here(covered in detail here)

There’s a lot of evidence (i.e regression work) that players ramp to a certain ADJP48 (raw productivity)  and  tend to remain within shouting distance of that number irregardless of position. Here’s a quick study I did based on the last ten years using just a straight up linear regression model:
Results for: Last 10 Years
Regression Analysis: ADJP48 versus Age, StdDev ADJP4, Max ADJP48
The regression equation is
ADJP48 = – 0.0631 + 0.00164 Age – 0.251 StdDev ADJP48 (1st 4 Y)
+ 0.850 Max ADJP48  (1st 4 Y)
S = 0.0673895   R-Sq = 69.5%   R-Sq(adj) = 69.4%

So Age and prior performance explain a significant amount of future player performance. Now the Age curve  looks something like this (looking at players born since 1970):

And graphically like this:

You see  a gradual increase up to about 24 and then players remain at a consistent level across until they hit a gradual decline at 30.

What does it all mean?

You Knew I couldn't resist

If I want to build a simple model of future player performance I work under the following assumptions:

  • Players ADJP48 will remain consistent over time with noise on a year to year basis
  • Age is important towards the end and beginning of a career
  • Minute Allocation remains consistent over time

So with that I take the average minutes played for the last three years for each player (to filter out year to year noise), the weighted average of ADJP48 and the position adjust for the last year and project a win model. The model is good enough for a quick take but not as good as something like I would do for a season projection. The weaknesses here are:

  • Old or Young players (Collinson and Posey come to mind)
  • Minute Allocation by coaches.

If I were a GM, I’d have a stat department and I would build a more robust model for this kind of transaction. I also would not be doing quips on the internets (at least not under my real name).

Written by:

14 Comments

  1. Tom Mandel
    8/12/2010
    Reply

    Good work — it would be even better, however, if it supported the choice of data sets that I questioned! :) You didn’t evaluate Ariza’s value in the trade using this model — and for sure it affects Troy Murphy’s value in the trade.

    Moreover, you are still in the “turtles all the way down” mode whereby we use a *statistically* valid fact as if it translated directly into information about an individual instance. In Trevor Ariza’s case, look at *his curve* of productivity from year to year. I’m sure you are familiar with the concept of “regression to the mean.” Why not apply *that* principle in choosing a dataset. In fact, regression to the mean militates against “Players ADJP48 will remain consistent over time with noise on a year to year basis” — or am I wrong about that? Yes, if we are using “players”, the curve keeps its shape. But we are *not* assessing “players”, we are assessing *one* player — Trevor Ariza.

    Let me approach it another way — suppose I smoked. That would put me in a class of people w/ an “n%” higher chance of dying from cancer, right? It’s *the class* to which the % bump applies. Can you say to me — “Tom, man, you have an n% higher chance of dying from cancer; you need to quit smoking.” No, of course not. *My* greater chance of dying from cancer is an unknown. And even that unknown is only of important in my case if I don’t die of something else first!

    Now, the statistical information is *useful* to be sure — you would still be right to advise me to quit smoking. Who wants to be in the class w/ the cancer bump! But the statistical fact gives you no useful information about my particular case.

    Sorry to write at this length. WP48 is exceptionally useful. And, yes, NBA teams need stat departments (and to “think” in many other ways too!). But WP48 is not useful in many of the ways I see it being used.

  2. 8/12/2010
    Reply

    Tom,
    The correlation between weighted avg. ADJP48 for 07-09 and actual ADJP48 for 2010 is 71.7% (Good enough for a quick and dirty calculation)

    Ariza’s numbers are:
    Year ADJP48
    2007 0.360
    2008 0.420
    2009 0.370
    2010 Projected 0.374
    2010 Actual 0.270
    2010 Projected 0.311

    So I feel justified in using .311 as a conservative estimate for Ariza.
    Now as we advance we’d like to start looking at similar player to Ariza (as we would for cancer risks) to make projections but were not there yet.

  3. Tom Mandel
    8/13/2010
    Reply

    You do good work, Arturo! :)

  4. TBall
    8/13/2010
    Reply

    Arturo,

    You know what really grinds my gears? Unsubstantiated journalist rhetoric that somehow is supported solely on the echoes of other journalists. Or how coaches that win can spew fiction and have it taken as gospel because they won.

    Case in point. Doc did not rest his starters in preparation for the post season. The Big 4 averaged 36, 35, 34 and 30 minutes – 135 minutes cumulatively. That is the same average they have maintained over the previous two seasons, give or take a minute. For the love of our Celts, set the record straight. However, he gets to the Finals and everyone buys into Doc’s position the 50 win season being part of the plan despite the fact minutes are a well-kept stat that anyone can check.

    Also, loved the historic adj48 average last Sunday. What league characteristics account for every position being below average some years and above average other years? Does that get to pace/number of possessions or is it more involved?

    • 8/13/2010
      Reply

      TBall,
      You mean journalists parrot points mindlessly based on want everybody else is saying without actually analysing the actual data? You just rocked my world view, I need to sit down and have Joe Morgan talk me down.
      You’re right on the Celtics. Their problems in the regular season had to do with age, health and having the first three guys of the bench in terms of minutes (Sheed, Marquis and Big Baby) stink up the joint to the tune of 2.3 Losses produced not “resting the starters”. They couldn’t put their best team on the floor but it wasn’t part of a plan. In the playoffs most of these issues went away (The big three, T.Allen and Big Baby were healthy,Sheed actually ran basket to basket after the Miami series) and they rolled with their best team until game 6 of the finals. (wrote a post on this here http://dberri.wordpress.com/2010/05/16/lebron-cavs-in-pomp-and-circumstance/)

      As for what moves the historical ADJP48, It’s three things as far as I can tell (In order of Importance): Talent, Pace and the Rules (3 Point distance,hand checks,traveling). I didn’t pace adjust in that piece but I have before. Go take a look here for that view (http://arturogalletti.wordpress.com/2010/07/19/measuring-the-quality-of-basketball-in-the-nba-part2-adjusting-for-pace/)

  5. Guy
    8/15/2010
    Reply

    Arturo: I’m afraid there’s a significant problem in calculating aging curves as you have here. The problem is survivor bias: those players still good enough to play at each age after peak age will disproportionately be those players who “age well,” that is who decline slower than average. At each successive age, the problem grows larger. The only players still playing at age 35 are guys who aged very well (and they are also generally players with very high peaks, of course).

    The way to see this is to look at the N at each age. If your aging curve were correct, most players who were good enough to play at 30 would still be good enough to play at 33 — but in reality, half of them have disappeared. What is happening is that many of those 30 year-olds decline rapidly in the following couple of years, and leave your samples. Your regression is looking only at the survivors, and thus concludes that NBA players experience only a “gradual decline” after age 30. That’s true once you know who the survivors were. But in making future projections, you don’t. So you need aging curves that decline much more rapidly, to account for those players you are missing.

    A lot of work has been done on athletes’ aging curves, and it involves confronting a host of complicated selective sampling issues. Even work done by some highly-regarded academics has failed to grapple successfully with this survivor bias. You might find it interesting to review that literature.

    • 8/15/2010
      Reply

      Guy,
      Great tohear from you. I am aware of survivor bias. This data set in particular is looking at players with 10+ year careers only as an apples to apples comparison. The data set parameters were:
      * For this analysis I looked at all player drafted from 1977 that played a minimum of ten seasons at greater than >400 MP.
      * I based the analysis on ADJP48 (i.e. non-position adjusted WP48 see here for an explanation)
      * I standardized all players based on their Peak ADJP48. So all players are listed based on % vs their Peak Year.
      * I divided players by Year Born.
      This gives me a more homogeneous population to test,
      What is an interesting finding is that the peak age for this population is rising over time:
      http://arturogalletti.files.wordpress.com/2010/07/peak-ages.png
      So two things come to mind as conclusions. Medicine us helping extend careers and there’s less risk in investing in an older player (if he’s good enough to stick around the league).

      • Guy
        8/16/2010
        Reply

        I agree on your first conclusion: since your methodology was consistent, the finding that peak age is rising is likely to be real. And that’s an important discovery if true.

        I’m much less sure about your second conclusion, of low risk in older players. Here is where the survivor bias becomes a large problem. You are only studying the success stories, those who play regularly for 10+ years. Say you are trying to forecast what a 30-yr-old player will do over the next 3 years. Some of these players will decline rapidly, doing poorly at 31, missing your 400 MP minimum at 32, and out of NBA at 33. But your age 32 and 33 estimates aren’t impacted by this player at all. This will definitely result in underestimating the downside risks of such players.

        One approach to consider: combine this with your replacement level work. For all 400+ MP 30-yr olds, if they fall below 400 MP at later ages keep them in the sample as replacement-level players — surely that is still an optimistic assessment of their ability at that age. If you do this, I believe you will find a much faster decline in average performance.

        I understand you think you are looking only at “good players” who age more gracefully. But my guess is that even if you look only at 30-yr-olds who play at least 1000 minutes, you will still discover that a non-trivial number declined rapidly over the next 3-4 years.

        • 8/16/2010
          Reply

          Guy,
          You’re right on the second part, I wasn’t clear enough. I meant older than 24 to about 31 (the natural place for the 1st long term free agent contract) . Attrition hits at 31 and gets worse from there (granted the 31 year olds I’m looking are by nature an older data set, it will be interesting to see how current player do)
          Age Count of Players Avg vs Peak Survival
          26 188 80.1%
          27 183 76.2% 97%
          28 178 80.4% 97%
          29 171 77.6% 96%
          30 175 75.2% 102%
          31 149 73.3% 85%
          32 116 75.2% 78%
          33 81 72.6% 70%
          34 57 68.9% 70%
          35 33 73.5% 58%
          36 16 75.2% 48%
          37 9 65.6% 56%
          38 1 69.6% 11%

  6. Raspu10
    8/15/2010
    Reply

    @Tom,

    If the average length of an NBA career is 5 years, which it is more or less, than you’re dealing with 50% loss much earlier than age 30. It seems to me, as untutored as I am, that this curve could be overlaid on length of career data to create a better picture, but is it necessary for this to be useful?

    At the older end of the curve, you already know the player, and have substantial data on their costs and production – you’re not buying potential. You have enough data to calculate what their peak probably was, and you can be pretty sure in general that it’s behind them. Unless they are Jason Kidd, anyway. Any player you are concerned with over 32 is by definition exceptional. The attrition rate guarantees that. More importantly, we can model their historical value curve based on individual history rather than statistical projections.

    For players in the 24-30 range, you have a good chance they’re at or close their peak. In this range, you would evaluate players for a combination of production and potential. That is – analyze their career data for not only actual yearly production, but how it compares to their yearly peak production expectation and average peak production expectation. You might go so far as to generate similar charts for position, and see how the curve looks for, say, point guards.

    It’s the younger players where projections have the greatest value, because we have the least individual data. If we are modeling a 19 year old vs a 22 year old, the percent for peak production matters, because we have an expectation of meeting unproven potential.

    So yeah, if you want to model decline in athletic performances for an age cohort, the attrition matters. If you are applying the model to NBA personnel decisions, not so much.

    • 8/15/2010
      Reply

      Nods head. Yep. @ 24 and up you know who a guys is and who he will be. You can use this to mitigate the risk in your investment. The age curve is why someone like Cuban defensively gave up on Nash (who if you read about him is an insane workout/health freak and that makes him an outlier).

Leave a Reply

Your email address will not be published. Required fields are marked *