NBA Draft

Utilizing Archetypal Analysis to DeMystify the NBA Draft | Part 1: How RBAC Shaped My 2026 Board + NCAA Roles

Building a role-based archetype model to match the awesome complexity of projecting basketball development

By David Lee

rbac-part-1-hero

I first applied Role Based Access Control, known colloquially as RBAC in the data engineering world, in my very 1st post-grad client engagement. As an associate consultant who was mainly doing grunt work, I jumped at the chance to review the database architecture of the solution we were building.  The user permissions I discovered were built upon individual permission management, which becomes exponentially more expensive as a project scales, requiring multiple rows to define what a single user can access. While I wasn’t aware of RBAC as a formal concept at that time, my proposed solution was rooted in the foundational principles and value propositions of controlling access to resources through a meticulously curated role. 

RBAC was formalized in 1992 by David Ferraiolo and Rick Kuhn, 46 years after the birth of ENIAC, modern computers' closest ancestor. As computers rapidly became more and more complex, the governing principles of how to manage their capabilities needed to evolve alongside them. At its’ core, RBAC is an amalgamation of tech bros' favorite buzzwords…”data governance”, “data privacy”, "principle of least privilege", etc. , which are all different concepts to describe the same overarching idea. Sophisticated & powerful databases need robust frameworks to enable effective interactions.

A well-architected database is rigorously designed to account for the needs of every potential user. Thus, defining access on an individual basis is a reflection of poor foundational planning. It's far more logical to define the database around roles than to try and build a role-agnostic database, because assigning individuals to the roles that best fit their use case & skillset is how the database runs smoother.

what came first
which came first, the role or the system?

My belief in this role-first approach transcends beyond database architecture, it's a principle that I apply to systems in general. As I reflected upon potential points in my life where this philosophy sprouted, I realized that my fascination with roles is almost certainly because I attended International Baccalaureate public schools throughout the entirety of my K-12 schooling. My pre-collegiate years were heavily project-focused, which exposed me to role delegation early and often. The entire curriculum was built around the "IB Learner Profiles", a distribution of learning aspects that was used to encourage students to be holistic learners, embodying shades of each individual profile in our own learning style.

ib-learner-profile
i wanted that knowledgeable certificate more than anything

This heuristic of contextualizing learning and team building within nebulous archetypes has been a large part of my approach to the NBA draft. While I've refined my philosophy a great deal in a year's time, the portion below largely still remains my personal North Star.

"My draft philosophy is centered around quantifying, contextualizing and understanding a prospect's playstyle. My goal is to understand not just how a prospect functions, but why they function the way they do. When I uncover statistical red flags in a prospect's profile (negative AST/TO ratio, low rim frequency, low stock%, etc.), garnering an understanding of their role, team context and playstyle is my first priority, as it allows me to determine which problems are true limitations. Contextualizing players within roles also assists me in projecting their NBA outcomes, as certain playstyles have a lower threshold to attain rotation-level impact AND a prospect's ability to function within a variety of different playstyles, roles & contexts makes me more confident in projecting them as a quality NBA player. The tiers below are used to clearly define the line for returning at minimum rotation-level value:"

At the time that I wrote this, I was assigning roles using labels that felt intuitive relative to my understanding of the game, but I had no visibility into the statistical processes that went into how Bball Index assigned such roles. I also felt uneasy in the robustness of making deterministic assignments of player archetypes, as I was essentially presenting a very narrow path forward for each prospect, when each of them had varying degrees of adherence to the role I tagged. A key advantage of RBAC is that users' can stack roles, simultaneously existing as both an analyst and an engineer. Exhibiting the Principled profile while also embodying the Knowledge profile. Projecting as an Offensive Engine while maintaining shades of being a Connective Shooter. The framework needed to match the awesome complexity of projecting basketball development.

arhcetypes-the5f-big board

Similarity Defined by Distances from Contrasting Ideals

archetypal analysis

I'll save all the in-depth nerd jargon for the end of this piece, though it's worth briefly touching on how I opted to use archetypoid analysis[1] to classify prospects.

Some helpful definitions as I walk through my process:

  1. Clustering: an unsupervised machine learning algorithm that organizes and classifies different objects, data points, or observations into groups or clusters
  2. Centroid: the central point or arithmetic average of a group of data points, you can think of it as the average stats that of the players assigned to that group
  3. Measures of distance: mathematical functions used to quantify how similar or dissimilar two objects are based on their features

Clustering is a massive umbrella that encompasses many different algorithms that all use varying measures of distance to answer the same question: how similar is this subset of data? These are commonly used to perform sentiment analysis, customer segmentation[2] and image classification[3]. This was the data classification method I was most familiar with, but the outputs are extremely deterministic and they have a tendency to either ignore outliers or pull them into clusters they really don't belong to. The vast majority of these algorithms would have replicated the rigid output I was already doing manually.

Because I wanted to model archetypes multidimensionally, I considered using some of the clustering algorithms that can output probabilities a data point is in a # of clusters, though I wasn't able to find much public work specific to sports and these algorithms don't quite reflect just how multifaceted the majority of basketball players are. Basketball is the one major sport that truly ebbs and flows, both in style of play and responsibility and players regularly exhibit qualities of several different roles in the course of a single game and even more so over an entire season.

A few months ago, Jack Blasberg posted some posters from the Sloan Sports Analytics conference. The poster in the bottom right corner caught my eye, as it seemed like it could be the exact type of approach I was looking for.

This paper from Clutch Data introduced me to archetypoid analysis (ADA), an unsupervised machine learning algorithm that seeks out the extreme points in a dataset instead of finding the most similar, while still outputting representative probabilistic distributions of basketball archetypes. Their work applying ADA to the NBA was the example I needed to build proxies to classify collegiate playstyles, as the explicit NBA quality playstyle data does not exist in a manipulable state for college basketball.

The 8 offensive stats that were deemed worthy enough[4]to classify the below archetypes:

  1. Shot Profile Difficulty: difference between league average true shooting% and the player's expected true shooting% based on their shot profile
  2. Creation Difficulty: weighted 2-point creation metric
  3. Rim Share: % of FGA at the rim
  4. 3-Point Attempt Rate: % of FGA from three
  5. Dunk Rate: % of rim FGA that was dunked
  6. % of Field Goals Made that were Unassisted: unassisted buckets
  7. Box Creation: Ben Taylor's formula for estimated open shots a player creates for teammates per 100 possessions.
  8. Offensive Rebounds per 100 possessions: OREBs per 100

The 8 Offensive Archetypes

Spot-Up Shooter

Defining Traits: High 3PR, Low Self-Creation

"Off-ball play finishers who do most of their damage from beyond the arc"

2026 Prospects: Fletcher Loyer (Purdue) 72.1% │ Isaiah Evans (Duke) 50.9% │ Alex Karaban (UConn) 49.3%

Historical Example: Paul George

evans-arhcetype-distribution

Primary Handler

"Scorers who shoulder a high shot difficulty burden"

Defining Traits: High Self-Creation, High Shot Creation, Low Offensive Rebounding

2026 Prospects: Nick Boyd (Wisconsin) 68.4% │ Ebuka Okorie (Stanford) 66.9% │ Quadir Copeland (NC State) 61.2%

Historical Example: James Harden

okorie-archetypes

Hustle Big

"Garbage man bigs who get their work done in the dunkers' spot"

Defining Traits: High Rim Rate & Offensive Rebounding, Low Shot Difficulty & 3PR

2026 Prospects: Tobe Awaka (Arizona) 50.4%│ Izaiyah Nelson (S. Florida) 46.4%│ Maliq Brown (Duke) 38.9%

Historical Example: Joel Embiid

maliq brown

Interior Creator

"Drivers & Post-Up Players that put pressure on the rim"

Defining Traits: High Self-Created Interior Scoring, Tough Shot Diet, Low 3PR

2026 Prospects: Melvin Council Jr. (Kansas) 58.0% │ Jaden Bradley (Arizona) 54.7% │ Michael Ajayi (Butler) 48.6%

Historical Example: Russell Westbrook

jaden-bradley

Rim Runner

"play-finishing vertical threat"

Defining Traits: Elite Dunk & Rim Rate, No Self-Creation or 3s

2026 Prospects: Ernest Udeh Jr. (Miami) 94.7% │ Felix Okpara (Tennessee) 68.4% │ Aday Mara (Michigan) 57.5%

Historical Example: Anthony Davis

aday-mara

Connective Shooter

"a slightly more dynamic spot-up shooter with a tougher shot diet"

Defining Traits: High 3PR, Low Rim Pressure & Offensive Rebounding

2026 Prospects: Donovan Atwell (Texas Tech) 58.6% │ Isaac McKneely (Louisville) 54.7% │ Ryan Conwell (Louisville) 52.6%

Historical Example: Bradley Beal

ryan-conwell

Interior Finisher

"an off-ball play finisher who mostly operates in middle of the floor"

Defining Traits: Tough Interior Finishing, Lower Self-Creation & 3PR

2026 Prospects: Trey Kaufman-Renn (Purdue) 56.2% │ Chris Cenac (Houston) 52.8% │ Nick Martinelli (Northwestern) 40.1%

Historical Example: Kawhi Leonard

chris-cenac

Offensive Engine

"all-encompassing offensive forces"

Defining Traits: Elite Shot Creation & Self-Creation, Lower Rim Rates & Offensive Rebounding

2026 Prospects: Braden Smith (Purdue) 78.7%│ Christian Anderson (Texas Tech) 68.6% │ Darius Acuff Jr (Arkansas) 67.2% 

Historical Example: Kevin Durant

darius-acuff

The 6 defensive stats that were deemed worthy enough to classify the below archetypes:

  1. Steals per 100: STLs per 100 poss
  2. Blocks per 100: BLKs per 100 poss
  3. Defensive Rebounds per 100: DREB per 100 poss
  4. Block Recovery Rate: % of blocks recovered by the defending team
  5. Personal Fouls per 100: PF per 100 poss
  6. Height: in inches

The 6 Defensive Archetypes

Low-Activity Helper

"low-motor defenders who offer a bit of value on the glass, typically high-usage offensive players"

Defining Traits: Low Defensive Activity

2026 Prospects: Isaac McKneely (Louisville) 81.5% │ Bennett Stirtz (Iowa) 72.6% │ Kylan Boswell (Illinois) 70.4%

Historical Example: James Harden

benent0-stirtx

Energy Big

"bigs who defend away from the rim more often than traditional rim protectors"

Defining Traits: Shot-Blocking Big, lower foul rates

2026 Prospects: Emanuel Sharp (Houston) 46.5% │ Oscar Cluff (Purdue) 43.6% │ Maliq Brown (Duke) 41.2%

Historical Example: Karl-Anthony Towns

oscar-cluff

Rim Protector

"one-man anchors, defensive equivalent to the offensive engine"

Defining Traits: Elite Shot-Blocking & Size, High Defensive Rebounding

2026 Prospects: Ugonna Onyenso (Virginia) 90.7% │ Aday Mara (Michigan) 67.0% │ William Kyle III (Syracuse) 54.4%

Historical Example: Anthony Davis

ugonna

Havoc Creator

"highly disruptive perimeter defender"

Defining Traits: High Steals & Block-Recovery, Smaller Frame

2026 Prospects: Aaron Nkrumah (Tennessee State) 54.5% │ Duke Miles (Vanderbilt) 53.7% │ Ja'Kobi Gillespie (Tennessee) 42.8%

Historical Example: Stephen Curry

ja'kobi

Low-Activity Chaser

"off-ball defender who provides little value rotating to the rim or on the glass"

Defining Traits: Perimeter Chaser (some STL & fouls), Low Rim Protection

2026 Prospects: Tamin Lipsey (Iowa State) 67.7% [5]│ Otega Oweh (Kentucky) 47.1% │ Donovan Atwell (Texas Tech) 46.8%

Historical Example: Damian Lillard

lipsey

Weakside Helper

"glass-cleaner that ends possessions"

Defining Traits: High Defensive Rebounding & Size, Low Steals

2026 Prospects: Trey Kaufman-Renn (Purdue) 75.5% │ Jaxon Kohler (Michigan State) 74.7% │ Baba Miller (Cincinnati) 74.4%

Historical Example: Kevin Durant

baba

Interpretability, Takeaways and Learnings

In his own philosophy article, Jason stated that "the pursuit of new and the pursuit of correctness go hand-in-hand." Introducing these novel archetypes into my evaluation framework has quantified many of my opinions on this draft class and about prospect evaluation as a whole. For instance, my love for malleable prospects is modeled glowingly by Cam Boozer and Darryn Peterson.

cameron-boozer
Cam Boozer's role distribution
peterson
Darryn Peterson's Role Distribution

Dybantsa's role distribution however, echoes Ahmed's assertion that he:

"may be the most acute representation of one player transforming the complexion of a team’s playstyle of any prospect in recent memory." - from Draft Notes: AJ Dybantsa
dybantsa
Dybantsa's role distribution

In today's NBA, where everyone can legitimately do a little bit of everything, where bigs can initiate offense, wings are running high PNR and guards are playing out of the dunkers' spot, role malleability, even from a primary offensive player, is absolutely paramount. The tradeoff of platforming a player with a very narrow optimal role distribution reduces the capability of more holistic players to be able to contribute, which depending on the overall impact of that aforementioned hyper-specialized player, could bottleneck the overall efficacy of the team.

Dybantsa boasts one of the most impressive self-created seasons we've ever seen. 76.5% of his baskets were unassisted, ranking in the 95th%ile historically. He has a bonkers adjusted rTS% of 9.9% (95th%ile) with a 48.3 offensive load (99th%ile). There are just 24 prospects in my database who clear his 90th+%ile rankings as a load-bearing scorer. He rubs shoulders with the 2026 finals MVP, one of the greatest scorers ever, his fellow 2026 draft class mates Christian Anderson & Labaron Philon and 2 absolute gems in the 2nd round, Austin Reaves & Ajay Mitchell. The average height among these nuclear shot creators, is just 6'2. Dybantsa measured a full 6 inches taller.

aj-sorted-by-height
Table sorted by height

That is the incredibly enticing AJ Dybantsa sell. This is the biggest, most effective, load-bearing offensive usage sponge freshman prospect we've seen since Kevin Durant. And yet, despite the unbelievably overwhelming scoring indicators, Dybantsa's scoring process is also reminiscent of some of the biggest false primaries in NBA history.

AJ Dybantsa's Scoring Process is Suboptimal for his Projected Role

There are far more offensive engines at the collegiate level than there are in the NBA, of course in raw totals but also in viability. Remaining an impact player at the next level is a whole lot easier when you have other paths to success. Jevon Carter blossomed into an offensive engine in his senior season at West Virginia, but he was incredibly inefficient, particularly at the rim, shooting just 44.1%.

jevon

His defensive edge is what's kept him in the NBA, while the role inheritance from being overtasked in that senior season, has allowed him to remain a slight threat running PNR on the weakside.

jevon carter

LaBaron Philon was another name in that uber-productive offensive query, who I'll end up ranking higher than consensus relative to where I have Dybantsa because he's exhibited multiple paths to success. Philon played much more off-ball in his freshman season, grading out as a Primary Handler before making the leap to Offensive Engine in 2026.

philon

So Philon's profile marries together an uber-effective offensive engine and an effective off-ball handler who defended much better at lower-usage, all while maintaining a very impressive 2P self-creation load AND efficiency. While he's tested extremely poor in back to back years at the combine and he's incredibly light, I think his strong 2P signal is an indicator that he may have a deceptively strong trunk and the freshman year defense makes me think he could slot in nicely as a screen navigator on defense and thrive in a Deuce McBride/Ajay Mitchell/Payton Pritchard-esque role as an off-ball guard that doesn't bleed TOO much defensive value and strongly compliments lead creators on offense.

This persisting across multiple collegiate seasons is why I'm more confident in my eval of Philon than Dybantsa, but it's also reflective of Philon's enticing duality as a player who can shapeshift himself up OR down the usage chart.

With all that being said, Boozer & Peterson will take their place in the 2 highest echelons of my board, because they were similarly as impressive while exhibiting a more holistic footprint over the game.

Quick Hitters

I've enjoyed searching for anomalous role distributions at particular positions, with maybe my favorite example being Dailyn Swain's primary handler/Interior Creator split.

swain

I was extremely pleased to see Tari Eason pop up as a defensive comp for Allen Graves (and Covington as well).

graves-defense

Ebuka Okorie's similar profiles made me rethink him for a bit as well.

Next Steps

We're at the end of the article and I've only talked about the algorithmic output of the 1st 5 words in this sentence: "...classify prospects' current pre-NBA role, then project their NBA role, impact and playstyle based on their statistical profile and pre-NBA playstyle". This is mainly because I underestimated just how long it would take to fetch the NBA data I wanted to compare profiles historically, though I think not quite having the full model output finished has also afforded me the opportunity to slowly integrate these distributions into how I conceptualize prospects. It WILL be finished though, so be on the look out for Parts 2 (and probably 3) of the expansion of this model.

Click here to checkout the brand new version of the 5th Factor and the updated blind big board quiz here!


The Nerd Stuff

1. Objective

Create probabilistic role distributions utilizing Archetypoid Analysis t0 models extreme styles to describe playing style/role, not quality (good-vs-bad) or team context. Devise interpretable labels for each archetype.


2. Methodology

  • Convex membership. Each player-season's α-vector (the persisted
    distribution) is solved by non-negative least squares with a big-M sum-to-1
    augmentation.
  • Initialization: FurthestSum, then a bounded local-swap that
    minimizes reconstruction error (the Vinué archetypoid refinement).
  • Two separate models: offensive features → offensive roles; defensive
    features → defensive roles.

3. Quantitative measures used

Pruning methodology and more

quantification measures for Archetypoid Analysis

4. Staged model evolution

Decisions I made as I iterated through different features and models

model decisions



5. Final model

offensedefense

K diagnostics (final feature set):

  • Offense intra-archetype variance: K7 = 3.36 (min), K8 = 3.49 → method points to 7; RSS 7→8 is a flat plateau (2784→2768). K=8 kept by interpretability.
  • Defense RSS: K4 = 91,991 → K5 = 22,545 (the cliff) → K6 = 22,152 (plateau) → elbow at 5; K=6 kept because 5 merges the rebounding forward.

Interpretations: The most distinct archetypes are the bigs (Rim Runner 3.57,
Hustle Big 3.02) and Primary Rim Protector (3.36). The least distinct are Connective Shooter (1.24) and Interior Creator (1.25) on offense and Low-Activity Helper (0.89, zero strong features, a 3,233-member near-average catch-all) on defense. Low-Foul Big (1.74) is an archetype I didn't want to keep but it was legitimately informative (one strong axis, pf −1.6) .


6. Final roles

Offense: Spot-Up Shooter · Secondary Handler · Hustle Big · Interior Creator ·
Rim Runner · Connective Shooter · Interior Finisher · Offensive Engine

Defense: Low-Activity Helper · Low-Foul Big · Primary Rim Protector · Havoc
Creator · Low-Activity Chaser · Weakside Helper

Footnotes

  1. If you're interested in reading more about archetypoid analysis, start here: https://www.sciencedirect.com/science/article/pii/S002002552100791X
  2. what types of customers a store has, etc.
  3. an image generator i made in undergrad that attempted to create novel images then classify them with a separate model
  4. for more details on how these statistics were selected & the methodology i used, read the portion at the end of the article
  5. a great example of the limitations of even ADA, Lipsey has been perennially one of the best defenders in college basketball but the model doesn't reward the steal rate without rebounding equity or impacting shots at the rim.

About the author

David Lee

I’m a lifelong basketball enthusiast who blends film study and advanced analytics in my independent coverage of basketball and the NBA Draft across Tiktok, Twitter, Youtube, Substack and Instagram. I’ve also covered the Hawks for ~2 years as an accredited digital journalist for Afro News, and I am a member of the Atlanta Hawks’ Creators Collective.

Comments

Sign in to join the conversation.Sign inCreate account
No comments yet. Be the first to share what you think.