NBA Draft

To What Extent Should We Value Collegiate Production? | Exploring the Predictive Power of a New Way to Scout NBA Draft Prospects

By David Lee

536dbc97 43fb 4fc5 841e 98be214606f0 1920x1080
536dbc97 43fb 4fc5 841e 98be214606f0 1920x1080

Draft season challenges its’ faithful with a paradox every cycle, tasking the most devoted evaluators to somehow simultaneously cast a wide net to uncover underrated gems, while also demanding painstakingly meticulous exploration through the nuances of individual prospects’ profiles.

Last year’s class has produced 70 players to appear in at least 1 game through December 28th, with 61 of these rookies playing their pre-draft season in the NCAA. Those 61 players combined to play ~ 2100 total games with an immense ~ 60,000 minutes of on-court playing time. It would take forty-two (42!!) days to watch every minute of the final collegiate season of the newest crop of NBA players.

In a draft pool with ~ 5,500 NCAA Division 1 players, relying solely on film study to parse through the sample and distill it into a more digestible & informative subset is virtually impossible.

Of course, a pure eye-test approach is far more feasible for NBA scouts with refined philosophies, resources to more efficiently watch tape & connections with teams and clubs to gain insights on players, but the reality is this is not a viable approach in the slightest for the vast majority of aspiring evaluators. Add in the sparse nature of publicly available pre-NCAA data for incoming freshmen, and sadly most amateur draft evaluators are left with RSCI, national analysts’ opinions and the draft twitter echo chamber as the most dominant influences on their perceptions of each passing class.

db966b67 c72e 4606 8185 c39b5a2b1a35 1062x940
RSCI is still a valuable sieve when used appropriately

These imposing barriers of entry - lack of accessible data, the amount of readily-available surface level analysis & highlight reels and the sheer amount of prospects in the pool - has likely stopped many from making the daunting investment to develop their own individual draft philosophy, instead relying entirely on consensus or fandom to drive their draft discussions. There’s absolutely nothing wrong with this for the record, fans should absolutely be allowed to go to bat for their favorite players, but I do think this embrace of consensus as law has contributed to an extremely divisive, and in many ways disrespectful, draft space. If you don’t have this “consensus T10 prospect in the T10” (so many people need to see that Tawny Park is proving the most successful evaluators deviate from consensus) or WooPig616 doesn’t see 3 of the Razorbacks’ starters in the 1st round, then the legitimacy of your draft evaluation is questioned (with some obscenities mixed in for good measure).

32f71abb bbda 4209 a286 661c52a43ae3 1920x1080
some recent boards where the authors got berated for being original

Like most problems in life, the current state of draft discourse is a product of a flawed system, not necessarily the individuals themselves. It’s a gap I wanted to bridge with my early-season watchlist, in an effort to establish my own priors on guys headed into the season, while also sharing some pre-NCAA data and analysis to explain who I’m watching and what I’m watching for.

While I still believe the work I shared was helpful for both myself and hopefully most of those who read it, prospect evaluation is far more nebulous and fluid than a static list of players. I needed to build a dynamic watchlist that provided digestible & informative statistics that could empower individuals to form their own opinions in real time while simplifying the task of tracking hundreds of prospects for myself and the draft space as a whole. This realization was my first step towards building what would become The 5th Factor.

17692f28 3074 430f a7c6 36e0ce880ea7 3024x1516
the homepage of my site, The 5th Factor

In 1848, Horace Mann famously wrote that “Education then, beyond all the other devices of human origin, is the great equalizer of the conditions of men”. This idealized concept doesn’t hold much truth when applied to our highly stratified Western society as a whole, as the modern education system has far too many barriers to truly facilitate equitable distribution of capital BUT this overall idea of “education” (in our case, the spreadsheets, i.e. statistics) leveling the playing field is 110% true in draft evaluation.

224856f5 7d1e 4605 934f 4094d60d4ac9 850x400

Even a cursory understanding of the most informative statistics that determine whether a player is suited to play NBA basketball drastically sharpens the scope of prospects to study, an absolute must in lieu of access to hyper-specialized data & film tools, collegiate program connections and the time to visually screen every potential prospect. Statistics are the ultimate contextualization method, which become even more instructive when paired with competition filters, positional/archetypal adjustments and team/role context. My belief in the filtration power of the spreadsheets necessitated that this dynamic watchlist should present all the data possible with numerous ways to truncate and parse through prospects on both the game and season average level.

The next question I had to answer was simple: how did I want to quantify production on both the game and season average level? Once I began pondering this, it quickly spiraled into a number of other questions: How can I distinguish “good” games from “bad” games? How can I avoid reinventing the wheel? What is production?

Generally speaking, there are 2 all-in-one metrics that are commonly referred to as measures of NCAA draft prospect viability: Box-Plus Minus (BPM) and RAPM (Regularized Adjusted Plus Minus).

16fed71a 7686 43c9 818c 11d67e5d7f95 586x370
BPM variable weights & positional/team context adjustment from BBallReference

BPM is a pure box-score based metric most commonly found on barttorvik.com that uses the most widely available data to estimate a player’s contribution in points above league average per 100 possessions played. These adherence to just box-score inputs is what makes BPM one of the most historically resilient public metrics available, but it also has made it a bit decrepit in NBA spaces, where projection metrics such as DARKO or hybrid metrics such as Estimated Plus-Minus have harnessed the player-tracking era data to fascinating results. BPM is still the standard for full-season collegiate evaluation because of its wide-ranging applicability, the length of players’ careers and thorough adjustments for position & team context, but it’s far more informative over a full season then it is single games or even spans of games.

b37600f8 96ea 483e 90e0 545d926f25fd 1076x834
BPM study from @criggsNBA

RAPM is the antithesis of BPM, as a metric that uses ZERO box-score inputs to estimate players’ impact. Regularized Adjusted Plus-Minus is a layered approach that begins with the foundational concept of “plus-minus” and stints (how a players’ team performs with them on the floor), “adjusting” for teammate quality, collinearity, clutch/garbage-time and homecourt advantage, then “regularizing” to account for low-minute players & outliers (ridge regression), shift the weightings of samples (Bayesian priors) or even accounting for coaching impact. If you’re interested in diving deeper into the calculation process, I’d recommend starting with this post and reading the linked resources.

805f51bd 22d4 4057 ab06 b2698324dff8 1920x1080

At its’ core, RAPM is interpreting how a team performs with a player on the floor and attempting to isolate the individuals that are most driving that impact. RAPM and it’s derivatives are the gold standard for long-term player evaluation, often identifying players that are underrated contractually and reputation-wise, but it struggles mightily with small samples and is much more informative over 3 to 4 year samples. There have been good results derived from collegiate RAPM, but it’s even more unsuitable for game by game interpretation of the quality of a players’ performance than BPM.

With the 2 most contemporary options eliminated, I was contemplating my next move when I recalled a metric used by one of favorite NBA twitter data folks, Sravan. He used DRE (or Daily RAPM Estimate) back in April 2024 as a rough measure of how well MVP candidates played on a game by game basis.

3723cf7b eebf 482d 98c2 52ee76f26844 1074x1076
via @sradjoker

Uncovering this gem in my Twitter bookmarks also reminded me of Owen Phillips’ utilizing DRE as a daily performance estimate during Summer League.

cd5d1b84 02f7 4cd7 b593 b0c9df90275f 1384x1424
via @owenlhjphillips

DRE is a single-game composite metric that estimates whether a player had a good game, based entirely on their box score. You can think of DRE as Game Score’s more advanced cousin, as it derives the box score weights from 14-years of RAPM data to better capture the statistical value of each box-score input (points, rebounds, assists, offensive rebounds, steals, etc.).

89882104 5186 4de4 ae38 a9ea9e474a77 796x90
DRE description from BBall Ref

DRE is much more punitive in comparison to Game Score, dinging players more harshly for turnovers while also crediting players heavily for forcing steals, in addition to appropriately weighting 3-point field goal attempts & rebounding. Those improvements make DRE’s upper and lower bounds a good bit tighter than Game Score. Hollinger notes on Basketball-Reference that a 40 Game Score is an outstanding performance, while 10 is an average performance. The best prospect DRE performance in my database was achieved by Ben Simmons on December 12th, 2015, who recorded a 33.4 DRE by putting up 43 points, 14 rebounds, 7 assists & 5 stocks on 80.8 TS% with just 2 turnovers.

The WORST performance was achieved by Tyreke Evans on February 26th, 2009, scoring just 8 points on 14 shots (26.9 TS%) with 0 stocks, 4 fouls and nine turnovers, recording a DRE of -12.6. For reference, the average DRE among future NBA players in their collegiate seasons (in my database dating back to 2003) is 5.99. The calculations on my site are based off of Kevin Ferrigan’s work in 2017: Updating DRE Tweaks.

In addition to utilizing DRE as a measure of production, I also wanted to explore the consistency of said production, using a concept called the Coefficient of Variation (COV%).

b0f6de74 e3bf 4dc0 b62a 32343455cd36 929x424
COV% and why I used the sample version

COV% is a measure of consistency that shows how much a player's performance (quantified by DRE) varies from game to game. Lower COV% indicates more consistent performance, while higher COV% suggests more variability. COV% is calculated as (sample standard deviation / sample mean) × 100. By using each player's mean and not the population, COV% accurately calculates how variable a player's performance is relative to their own baseline, NOT the prospect/league average. This means that a player like Pat Ngongba can be MORE consistent than Cam Boozer, despite Boozer doubling Ngongba's average production.

e0aa7f5a cab5 4207 8c5b 8e1f6779283f 3018x1222
Boozer & Ngongba’s season summary via The 5th Factor

Generally speaking, it is easier for big men and older prospects to remain consistent game to game, which makes consistent guard play/freshmen play that much more notable. Credit is due to both Frank and Sravan for their work quantifying game-to-game consistency on Twitter.

Part 2: Validating the Metric

At this point in developing the site, I was operating under the hypothesis that this measure of productivity (and how consistently productive a player is) will have some semblance of predictive power in terms of NBA translation. My thinking was that if a player consistently produces at their baseline against varying types of defenses, contexts and situations, they are more likely to replicate that on the NBA level. Production & consistency is also directly impacted by usage and role, which is something we’ll explore more in the limitations of DRE as an evaluation metric.

Once I got the site in a presentable state, I set out on trying to test my hypothesis, comforting myself by saying even if DRE doesn’t contribute positively to NBA outcome projection, my site could still fill a need as a data-driven watchlist.

I started with a basic correlation analysis to see whether there was any initial suggestion that collegiate DRE/COV% had translation to their NBA equivalents.

e71151e8 7e37 4ff6 8681 d8dafdccec22 602x168
correlations without age/class adjustment
61a7aa38 c2df 4411 803c 0c2692a0f9bb 732x280
marginally improved correlations when EPM data is used to truncate the dataset

A .29 and .26 correlation was substantial enough for me to continue with exploring the statistical significance of collegiate production on both NBA production and eventually, NBA impact. I ran a couple of regression models with 900+ NCAA -> NBA players using EPM’s estimated wins, age, class, etc, alongside the productivity and consistency measures and found some strong instances of predictive power in terms of NBA translation.

360ef312 2a95 4746 ad0b 849898d7241b 1920x1080
regression results

Generally speaking, a prospect’s DRE in college (and as a rookie) is predictive of both their Rookie/NBA DRE and NBA impact (measured by dunksandthrees estimated wins). A freshman producing prolifically and consistently is uber-indicative of NBA translation, but I’ve also found production translates regardless of class/age.

These results are consistent with the findings of similar-BPM related studies and even the anecdotal glances at some of the historical translations seem to hint at DRE’s predictive power. For example, here are the most consistent NBA players in my database and their collegiate/rookie consistency metrics (remember, LOWER is better):

29a72c76 4d30 48b3 8997 281696dc3f71 2104x1480
reading through the list to see Shai at 10 legitimately got me excited

and here are the most productive collegiate NBA players and their rookie/career DRE + added wins via EPM’s estimated wins:

d9869d3d bf51 4f41 ab00 4b090a21b2f3 2422x1472
that Haliburton guy was destined to be pretty good

The distribution of seasons and minutes played across the distribution of DRE values is also similar to BPM (though DRE will have a slight negative skew since the population only includes players who are SUPPOSEDLY the best of the best).

f72f9c3a eb83 45ee 9b1c 3344238ab2f5 656x568
BPM 2.0 histogram via BBall Ref

The next thing I wanted to identify was whether DRE could identify a clear dropoff in NBA production/viability based entirely on collegiate production. This concept of an “undraftable threshold” is popular among BPM-truthers, though it (and my DRE equivalent) should not be used indiscriminately without properly accounting for a prospect’s context and role.

2421d5c0 4616 46cb b08b 9773d0bbbed7 2400x1800
boxplot viz of NCAA DRE values and career eWins
f8316827 5714 44dc 9554 eb38a7223cea 3000x1800
line graph with confidence interval of Mean Career eWins at each NCAA DRE bucket

The table below and the prevalence of outliers on the box & whiskers plot illustrates it well, these plots reveal an important caveat about setting a baseline for draftability solely based on BPM/DRE/insert metric, as these thresholds for “replacement value” are extremely skewed by the high-end star players with productive careers over a long time. This is what’s causing a gap between the mean and median at every bucket interval.

dce77c02 cbce 41fb 9667 4b0863a41fc5 534x428

Now this is based on full career contribution to wins, so an study based on “peak season wins contributed” could help isolate this threshold a bit cleaner. Based on this current data however, the average former NCAA player in my database contributed 10.15 eWins over the duration of their career. This suggests that the golden threshold is probably somewhere between an average DRE of 6 and an average DRE of 9. A player’s draftability should be heavily scrutinized, especially in the 1st round, if they are below a 5 DRE, while a DRE above 10 is a great chance at selecting one of the better players of the class.

a5c7c16e e2f5 476f b4f3 6323711d18c7 2800x1288
the 8 prospects currently above a 11 DRE (small sample, age & comp caveats are all relevant)
aa27078c 5afe 443d b2ab 76b7240c7434 3016x1508
the 30 prospects currently below a 5 DRE (small sample, age & comp caveats are all relevant)

Jesse Fischer (@jessefisher33) shared his findings on NCAA RAPM+ and its correlation with NBA Peak Wins about 3 years ago on Twitter. His work inspired me to visualize DRE’s correlation segmented by class on both eWins and total games played in the NBA.

bee39736 f0ac 4dd8 9be5 b28b4921fe82 1070x1360

The Overall Sample:

deb1ee8a d282 48a4 8ecf 0293fb0ee7d6 1920x1080
DRE vs Career eWins: r = 0.180 (n = 1020), vs Career Games: r = 0.128

Class Segmentation:

178eea01 8cc0 412e b794 789e226898f3 1920x1080

By Class Correlations:

Fr: r_ewins = 0.263, r_games = 0.169 (n = 200)

So: r_ewins = 0.246, r_games = 0.203 (n = 187)

Jr: r_ewins = 0.306, r_games = 0.166 (n = 187)

Sr: r_ewins = 0.191, r_games = 0.202 (n = 446)

It’s immediately clear to me that DRE likely underrates freshmen production relative to RAPM because of the variability of the underclassmen’s roles. Juniors having the highest correlation and upperclassmen having more significant p-values is intriguing as well.

You’ll notice that all of my analysis has been focused upon the season average, with sparse takeaways from the game to game DRE production (i.e. consistency). This is mainly because COV% didn’t quite measure up to DRE in terms of predictive power, which is understandable considering how many variables go into game to game performances. Honing on in a prospect’s best and worst performances isn’t the most instructive strategy even with the quantification of those performances by DRE, fluctuations should be expected within a small sample with these precocious prospects. It is certainly informative however, to try and understand WHY a prospect struggles in certain games vs certain contexts and thrives vs others and if these struggles have reared their heads in past seasons. This is an area where the spreadsheets can’t help you until the sample size is large enough (particularly against t100/t220 comp) and the best practice is to cut on the tape.

Part 3: Limitations of the Metric

I got a head of myself a bit in that last part of part 2, but I also want to discuss the limitations of DRE as a metric.

Because DRE has no plus-minus component and is entirely based on the box-score, there’s no element of assessing how the individuals’ performance impacted the team’s ability to win. Since we’re still in pre-conference play, the vast majority of standout performances have come in wins.

However, Tahaad Pettiford’s 2nd best game of the season came in a 29-point blowout loss vs Arizona. Being a -21 in a 29-point loss actually isn’t that bad, but it’s worth contextualizing his performance with Arizona casually beating Auburn by 30+ points virtually the entire 2nd half.

Another limitation of DRE is that it can only really hint at HOW a player performed, but there are ways to “scam” an incredibly high Daily RAPM Estimate. Let’s take Tounde Yessofou’s 27 point, 6-rebound performance on 11/15 shooting as an example of such an instance. Baylor played Sacramento State on December 2nd, dominating them by 22 points while shooting 60% AS A TEAM from the field. Yessofou’s 16.9 DRE ranked in the 96th% of prospect performances so far and in the 98th% historically (back to 2003).

There are ways to filter for the strength of a performance without cutting on the tape of course, from competition filters to simply gleaning the box-score to see how many areas of influence a prospect had over the game, but what DRE can’t tell you definitively (yet) is that 15 of his 27 points came either in transition or off offensive rebounds. It also can’t tell you that only 1 of his buckets were non-assisted (excluding the O-boards). Is there still value in generating turnovers, finishing plays and generating possessions on the glass? Absolutely. But the composition of his buckets should certainly discount this performance from being blindly considered as one of the better prospect games of the season.

Most of the limitations of DRE as a metric are because it’s intended to be used as a complement to the evaluator’s repertoire. Those who swear by film study can use the watchlist to refine which games they’d like to dive into, while the spreadsheet adherents can perform queries to search for historical & positional outliers the across game, span/streak and season average levels.

979aaf5e ed6d 4d5b b221 747fb1c1cc74 2806x1456
the streak finder on The 5th Factor (bring back early season Folgy smh)

I do have a NBA component that tracks NBA rooks & sophomores as well, with that tab really serving almost entirely as a watchlist, as the NBA already has so many robust player evaluation metrics that are better suited to capture impact.

Part 4: Next Steps

I’ve alluded to it a bit over the course of this article, but I want to make The 5th Factor an ever-evolving platform that morphs as I find new data sources to enrich the watchlist. I don’t really want to turn DRE into a hybrid metric, but I do have plans to add four-factor influence as a separate component on the game, span/streak and season level (five-factor once I patent my transition influence idea).

I also want to remain true to my goals outlined in my philosophy manifesto on my 2025 draft board. I intend to create a clustering algorithm to classify the 2026 prospects playstyles, then search for analogues throughout my database.

b3e95f42 176b 462d ba5a 4db4006817c2 2904x418
my philosophy overview on The 5th Factor

My ultimate vision is the most data-rich draft site imaginable, complete with comparisons across production + consistency (DRE & COV%), playstyle, anthropometrics & team influence across similar collegiate prospects & NBA players. The end goal is to create a draft model that validates translation across those 4 spheres while accurately projecting prospects’ NBA roles.

For now though, I’ll end this article with an excerpt from my bio on The 5th Factor. Thank you so much for reading.

During my time at Georgia Tech, I was a proud member of the GTWBB Practice team, which gave me direct exposure to incredibly intriguing data, player development strategies, set play designs and the wide-reaching impact of analytics in basketball operations. I drew upon these concepts heavily as I refined my understanding of the game, utilizing my Industrial Engineering background and storytelling ability to weave together compelling data-driven narratives for every type of basketball fan.In many ways, this site is a continuation of the learning approach I developed at Georgia Tech. Even the site’s name, The 5th Factor, is both a direct homage to Dean Oliver’s revolutionary discovery of the “4 Factors” that most impact winning - eFG%, TOV%, OREB% and FTr - and a nod to my desire to find the metaphorical “5th Factor” and beyond. I want to build upon the discoveries and findings of those who preceded me, utilizing these foundational principles to bolster my search for new sources of edge to better quantify this era of prospect and player evaluation.I want to directly combat the divisive hot take culture that’s poisoned basketball discourse by providing high-quality, palpable commentary and proprietary statistics.And above all, I want to have a positive impact on the collective consciousness of basketball enjoyers.There are a litany of reasons I could point to when explaining my love for basketball, but the most chief cause among them is my firm belief that basketball is the most communal contemporary sport due to its unparalleled accessibility & variety. How many other sports retain their essence no matter how many players are participating? There’s no equivalent to 21 in football or HORSE in baseball. There’s no sport more intimately intertwined with the culture, more embracing of players’ individuality or more supportive of creative philosophies and playstyles.Basketball is, and always has been, the people’s sport.That’s why my guiding principles as a basketball creator have been centered around creating things that serve the needs & intellectual desires of the people who love this sport like I do. That’s what this site is for. It’s why I genuinely want to know if it falls short of that goal and if there are suggestions for features that could enhance fans’ experiences.All I ask in return is that you factor this site, my commentary and analysis into your own basketball experience, and we all learn from the resulting discussions, together.

About the author

David Lee

I’m a lifelong basketball enthusiast who blends film study and advanced analytics in my independent coverage of basketball and the NBA Draft across Tiktok, Twitter, Youtube, Substack and Instagram. I’ve also covered the Hawks for ~2 years as an accredited digital journalist for Afro News, and I am a member of the Atlanta Hawks’ Creators Collective.

Comments

Sign in to join the conversation.Sign inCreate account
No comments yet. Be the first to share what you think.