Yesterday Kirk Goldsberry, contributor at Grantland, put up an impressive, super cool, honestly just exciting piece about the recent developments in the use of big data in the NBA. The timing was perfect. In writing “DataBall“, Goldsberry essentially said, “Hey Colin! Feeling down without football? Need to be caught up with the awesome stuff happening in basketball, which you know much less about? Here is this great article about the current state, and future potential, of basketball analytics.”
Cool, right? This is a positively exhilarating time for the NBA, or at least for nerds that like basketball, or at the very least NBA executives and coaches that like winning. The spread of improved technology will make this season the source of the most data of any year in basketball history. Though a little daunting to work with, the data are useful, useful useful useful, in a practical sense, and can quantify essential, previously quantifiable player traits and skills.
A Game of Big Men, and Bigger Data
SportVU technology, of STATS LLC, tracks the movements of every player on the court. Constantly. Precisely. Using SportVU, one can make a replica of the play like the one below: Tony Parker’s assist to Kawhi Leonard’s game-winning three in a February 13th, 2013 game San Antonio played in Cleveland. Check out Goldsberry’s article to actually run the continuous animation; these are just screen shots.
Very cool. Very very cool. This animation exists because Cleveland was one of fifteen NBA teams to have SportVU cameras and what-not installed in their arena last season. But before this season, the NBA installed SportVU in every arena. This season there will be data on everywhere a player goes, every time he steps on the floor. Last season, in half the games, SportVU produced 800 million player locations. The academics Goldsberry speaks of, presenting at MIT’s Sloan Sports Analytics Conference later this month, used 93 gigabytes for their work, using only last season’s data.1 And there will be twice as much this year, and in years to come.
Future Discoveries of NBA Basketball
It is foolish to quantify a player’s talent with a single number, and equally foolish to think the league won’t learn a lot from this newfound data. Which players create the best possessions for their teams? No longer must this question be gleaned at from filtering assists, shot charts, player efficiency ratings, and whatnot. Using SportVU tracking, with over a billion player positions every season, different floor positions can be assigned probabilities for different outcomes. A player open under the basket with the ball has a high probability of scoring two points; without the ball, a slightly less probability depending upon his probability of receiving a pass; closely guarded but with the ball, a slightly different probability based on his shooting percentage, his defender’s prowess, etc.
Every game state–the location of all ten players and the ball, in relation to each other and their respective baskets–has an expected point value for both teams. If this sounds like how Brian Burke of Advanced NFL Stats determines his Win Probability Calculator, his Fourth Down Calculator, etc, that is because it is fundamentally the same analysis.2 But because basketball is a little simpler, having only ten players out there, and because each NBA team has around a hundred possessions every game, and plays 82 games a season, the analysis can become way more
With a good estimate of the expected value of every game state in the NBA, breakdowns like the following become possible:
Staring at this graphic is just… enthralling. Look at it! Aaaauuuggghhhhhh!!! Who is the best passer in basketball? No longer is “Well, Player X has the most assists” or “Player Y has the lowest turnover rate” or “A team’s most points per possession come when Player Z mans the point” the best we can do. Now, we can say “Player A made a pass that maximized his team’s expected possession value on 94 percent of his passes, while Player B made a pass that maximized his team’s value on only 82 percent of his passes.” No, it does not have the same ring to it, but damn, is it sexy?!?
Forget passing, which player is the best decision maker?3 In the above image, Leonard’s shot probability is tied for the most likely outcome, even though by expected possession value it is his worst option; passing to anyone would be better. And these numbers can be tailored to individual players! With substantial sample sizes for individual players over the course of thousands of possessions, we do not have to settle for “Shooters make X percent of open corner threes”, we can specify that “Player Y makes Z percent of open corner threes”. Which point guard best understands his teammates’ strengths and weaknesses, the differences between the starters and the subs, etc? Which big man has the most added value when getting the ball at the post?
The answers to these questions will not be 100 percent perfect; a single number, or even a combination of numbers is unlikely to completely quantify what a player brings to the floor.4 But we will know more than we do now, in really an unprecedented way. The moral of the story is: with football over, I will be watching more basketball now. What perfect timing.
- Ninety-three gigabytes is a lot of data. For some perspective: the entire Lord of the Rings trilogy, the extended editions, on BluRay at 1080p definition, is 12 gigabytes. The complete series of Breaking Bad is 40.3 gigabytes. Of course, the Library of Congress estimates that they add five terabytes of content a month, or 93 gigabytes every 13 hours or so. ↩
- The Markov model, kids. Read about it. ↩
- My money is on Lebron James. Remember in the 2011 Finals when everyone shamed James for passing off to Wade in clutch moments? Maybe those passes were smart! Or, maybe they actually were terrible. From now on, with unprecedented objective data, we will have a much better idea. ↩
- Obligatory reminder: if a player sells out the house night after night, does his owner still care as much about his less than optimal expected possession value added? Probably not. ↩