NBA GOLD: a Response to the NYT Debate

In closing out the 2010s, The New York Times published an article debating the NBA’s Greatest of the Last Decade (GOLD). (Here’s the link to the article). In debate was whether the NBA GOLD was LeBron James or Steph Curry. Now with several championships under their belt, both players are legendary in their own right, but aside from a gaggle of sportswriters' opinion and observations, is there any data behind the NYT's GOLD? Should these two really be in the running for GOLD in the first place?

So once again, Xcalar is bringing a little DATA, a little rigor to this debate…. (reread our first NBA GOAT blog post)

tl:dr; jump to our NBA GOLD

The Methodology

STEP 1: Collect Play-By-Play Data from the Last Decade

We collected the last ten years of play-by-play data from BigDataBall. The data is delivered in csv format for each of the following season:

  • 2008-2009
  • 2009-2010
  • 2010-2011
  • 2011-2012
  • 2012-2013
  • 2013-2014
  • 2014-2015
  • 2015-2016
  • 2016-2017
  • 2017-2018
  •  2018-2019
  • 2019-2020 (current season games)

These are uploaded into a dedicated S3 bucket - comprising over 2GB of data, representing over 6.6 million play-by-play events.

NBA PBP Data, 2008-2020

The raw data comprises 45 columns that capture every discrete event during a game - possessions, blocks, steals, time elapsed, the players on the court, and also the x-y position of field goal attempts. As expected, this data is plentiful, but not exactly in a form that's easy to work with - sometimes the actual data is found in an unstructured text string (2pt vs 3pt shots were in a description field) and at other times, we were forced to back-out and determine which team had the ball (play-by-play was player based not team based, so we had to do a lookup), etc.

STEP 2: Calculate the Desired Statistics/Criteria

In sportswriting and pundit rooms around the country, there is a lot of emoting, feeling, backed by weakly cherry-picked "stats." For every GOLD, there is a statistic that can prove or disprove the assertion. What we wanted to do was to take away the feeling, the bravado, and ego, and let data determine GOLD.

But to do so, requires a who slew of statistics - sliced and diced in countless measures. Based on the 6.6+ million records over the past decade, we calculated and ranked each player based on these statistics.

DEFENSE:

  • Assists,
  • Blocks,
  • Rebounds,
  • Steals
Xcalar dataflow for building defense statistics.

OFFENSE:

  • 2pt percentage,
  • 2pt total points,
  • 3pt percentage,
  • 3pt total points,
  • free throw total points,
  • free throw total percentage,
  • total all-time points,
  • total playoff points
Offense Statistics

Xcalar's resident basketball analyst, Omar Agha, also suggested that we include some additional non-standard criteria to reflect the "clutch" nature of Greatest of the Last Decade (GOLD). So we also included specific Q4 statistics including:

Q4 STATISTICS:

  • percentage of points made in 4th quarter,
  • total Q4 points,
  • Q4 assists,
  • Q4 blocks,
  • Q4 FG percentage,
  • Q4 rebounds
Q4 specific statistics to showcase "clutch" play.

STEP 3: Apply Weightings on Each Statistic

Not all statistics are made equal, and as noted in the New York Times article, the statistics to determine the latest GOLD are different than they were in the 00's - specifically with heavier reliance on the 3-pt shot.

So we looked to Xcalar’s senior basketball analyst, Omar Agar, to determine the weightings that best reflect the game as it has been for the past 10 years. In order to quickly and simply adjust weightings, we captured the weightings through parameters, instead of hardcoding the weights in the dataflow nodes. The weights are shown at the right:

Because of the emergence of the 3-point game, you can see that we put greater 3-point percentages and 3-point total points. Additionally, we placed heavier weight on fourth quarter statistics - specifically percentage of points made in the fourth quarter and total points made in the fourth quarter.

Once these weightings were entered as parameters, we then applied them to each statistic and summed them up for a total point score.

 

 

Relative Weightings
Calculating GOLD

... Xcalar's Data-Driven Greatest Of the Last Decade (GOLD)

Congratulations on making it this far (or jumping down from the start), based upon 6.6+ million play-by-play data, analytics, and weightings from our resident NBA expert, our GOLD list is as follows:

Xcalar GOLD
Xcalar's NBA GOLD

Sorry NYT, but we're gonna disagree. Based upon data, we have this past decade's GOLD as Kevin Durant - followed closely by Stephen Curry and THEN LeBron James. When we dive down into the specific statistics that pushed KD over the top, we notice that he was relatively strong across all statistics and did not have significantly poor ranking (unlike LeBron's weak 3-point and free throw percentages). With the 3-point shot being so important in the evolution of the game this past decade and a corresponding higher weighting, LeBron's weak rankings in these stats proved to be the stat that kept him from the top spot.

KD FTW!
KD FTW!
comparison across key statistics

But wait, there's one more thing...

Of course, the NYT focused mostly on one matchup for their GOLD - Steph Curry vs. King James. While neither is our GOLD player, we wanted to see what our data would say about these two. Using the same data set, we filtered all the specific play-by-play (11k+ events) where both Curry and James were on the court at the same time and we calculated their effectiveness against each other.

James vs Curry
head-to-head Curry vs. James

When these two players are head-to-head, the data shows that it's a pretty even match-up, but that LeBron is able to eke out 1) more points and 2) higher FG percentage than Steph.

So if KD were out of the picture, King James would reign supreme as GOLD.

Curry vs James

Interested? Intrigued?

Have a bone to pick with our methods?

We love sports and we love analytics - if you'd like to get an instance with this exact data set and data flows AND results and play with it yourself - drop us a line! We'd love to share our analysis and get your feedback on how we can make it even better.

Drop Omar Agha, our resident NBA analyst a line at sales@xcalar.com

omar

Free 10-day access to the data, the dataflows, the analytics... no questions asked.

(Unless you want to talk sports, of course.)