the paradox of skill

when i came across this concept of the paradox of skill, it resonated. not sure if this is my imposter syndrome or sense of jealousy talking, but sometimes, i feel like my own professional accomplishments (e.g. like getting accepted into YC) and those of others around me, are increasingly influenced by luck, even though the people surrounding me have gotten more talented. this concept gave me an intuitive framework to explain that. let me start with a hypothetical:

let's say you're hiring for a role, and you have a pool of candidates who have applied. naturally, you want to find the most skilled person amongst the candidates and offer them the job. to find them, you devise a series of assessments and interviews that test for the skills you care about. after administering these tests, you would expect the best person for the job to be the one that measures the highest on the tests. of course, this is not the case if the evaluations are poorly designed, but let's assume for the sake of argument that they are. then, you should be able to identify the best candidate, but — and this is the subtlety — reliably doing so depends on the distribution of skill across that pool.

for example, if the size of your pool was two, and candidate A was much more skilled than candidate B, then your testing scheme should easily identify candidate A as the superior one. but if candidate A only had a slight edge over candidate B, then it would be much harder to isolate that. why? well, it's partly because the tests themselves are not perfect. you can improve your methodology, but you can never eliminate all the error. so it's also because as the skill levels get closer, the outcome becomes more effected by that error, or noise--the randomness that impacts measurement, e.g. the interviewer's mood that day, getting asked an obscure question, etc.

this is the phenonmenon that is known as the paradox of skill:

as the skill level of a group increases, the outcome of any competition among them becomes more dependent on chance, and less a function of skill.

this is because as skill levels get closer, the noise in the measurement becomes more significant relative to the differences in skill. this concept was popularized by Michael Mauboussin (author of The Success Equation, a good read), and even though it's stated in terms of increasing skill, it applies just as well to any situation where the distribution of skill becomes more compressed, e.g. if the skills levels are all very low, then the outcome is also influenced more by chance. but in the real world, this effect is more interesting with rising (not falling) average performance, so that's the context i'll focus on.

demonstration

i'm sure there's a way to ground this idea in statistical theory, but i don't have the chops to do that. instead, i'll explain it with some fun simulations.

let's switch to a simpler example: a freethrow shooting contest. it works like this: each player gets 100 shots, and the one that makes the most shots, wins. easy. let's look at a single participant, player A. we can model their performance as a normal distribution, where the mean (μ) is the average number of shots they make in 100 attempts, and the standard deviation (σ) measures how much their performance varies across games.

Mean (μ)

Standard deviation (σ)

5.0

when you have multiple players, each will have their own normal distribution. for our first scenario, let's pick four players with widely varying skill levels (e.g. from a casual shooter to an elite one).

Player A (μ=55, σ=5)Player B (μ=68, σ=5)Player C (μ=80, σ=3)Player D (μ=87, σ=4)

in this case, the distributions are nicely separated, i.e. the skill levels are spread out. if you were to repeat the same contest among these players over and over again, you'd expect player D to win most of the time, player C to win some of the time, players A and B to win rarely. in other words, the outcome would mostly be determined by skill. we can verify this by running a simulation of 1000 games:

Player A

—

Player B

—

Player C

—

Player D

—

you'd be hard pressed to find a scenario where player D doesn't win the majority of games by a large margin.

now let's consider what happens when the player distributions are closer together.

Player A (μ=84, σ=4)Player B (μ=86, σ=4)Player C (μ=88, σ=4)Player D (μ=90, σ=4)

the ranking of players by skill is unchanged (player D still has a higher average than player C, who has a higher average than player B, etc.), but now the curves overlap heavily, with the means bunched between 84 and 90. what happens when we run the same simulation of 1000 games?

Player A

—

Player B

—

Player C

—

Player D

—

players D still wins most of the time, but now player C wins a lot more often, and players A and B also win a non-negligible number of games. this is intuitive but also strange at the same time...why is there more randomness in the outcome, even though the skill ordering of players hasn't changed?

i've been using the term "skill" loosely so far, so let me make it more precise: skill is a latent variable, meaning it's not directly observable. we can't know a player's "true" skill, so we do our best to estimate it. in this case, we are using the player's historical performance as a proxy for their skill, e.g. the distribution of their scores across many games they've competed in. the mean of that distribution is just a way to summarize this proxy as a single number.

so when i say that player D is more skilled than player C, i mean player D's performance curve has a higher mean than player C's curve. so from the first to second scenario, the ordering of player's means is unchanged, but the relative distance between the means has shrunk. and when there's lower distance between the means, the variance in the scores / measurements become more significant in determining the outcome.

we can see this visually by considering just two players. the shaded region represents the probability of player C upsetting the superior player D. as the means get closer together, the region swells, and the probability of player C winning climbs toward 50%. the fact that the region even exists is because of the variance in the scores (if variance was zero, then the curves would be spikes, and player D would always win).

Distance between means

10.0

Player D (μ=85.0)Player C (μ=75.0)upset region (D loses)

P(Player C beats Player D on a given game) =3.9%

that's why the outcome depends more on chance in the second scenario. it's because the distribution averages are closer together.

but why do we have any variance at all? well, that's because we're measuring skill with an imperfect process. these imperfections arise from noise, various factors that are not related to skill but can still impact outcomes. even for a controlled environment for freethrow shooting, performance is going to be affected by subtle factors like how much air is in the ball, the level of humidity in the gym, the player's fatigue level, etc. these factors are what make the measurement, i.e. the outcome of each content, more random. and this is important to understand:

there will always be noise in any estimation based on real-world measurements.

this noise, which produces the variance in data, is what we colloquially refer to as luck. so we can treat variance as a way to quantify luck.

one more thing about this variance: it's not just a property of the individuals, but also of the measurement process itself. in other words, the variance is not just inherent to the players, but also a property of the competition. in any competition, different participants can exhibit different levels of variance in their performance. meaning one player might perform more or less consistently than another. but usually, the variance is relatively constant across players, and it's more a property of the estimation process itself (even though it shows up in the data as variance in each individual distribution). freethrow shooting is pretty constrained, so the variance will be much lower than a basketball game, or interviewing for a job. so the impact of luck differs across domains, and is intrinsically tied to the domain and not the individuals.

as a final demonstration, let's look at how the spread of skill and level of variance / luck can impact the outcome of our freethrow contest.

Skill spread (best − worst)

Noise / luck (σ)

Player A (μ=60.0)Player B (μ=70.0)Player C (μ=80.0)Player D (μ=90.0)

Player A

—

Player B

—

Player C

—

Player D

—

Best player win rate

—

Pure chance baseline

25.0%

Skill edge

—

as we vary the spread and variance, we can see that skill is still a factor. but as the spread decreases, the outcome of any one contest is more influenced by chance. this effect is more pronounced when the variance is higher, but it still exists even when the variance is low. in the extreme case, where the skill spread is zero, then the outcome is purely random (no matter the luck value). each player would win about 25% of the time. this makes sense because if players are all equally skilled, then the outcome is as good as random.

let's bring it home: in real-life competitive fields, the skill levels of participants naturally rise over time, but the spread of skill tends to shrink. this is typically because lower skilled participants either drop out or improve their skills faster than the top performers. and as our simulations show, this leads to more randomness in the outcome of competitions. this is the paradox of skill.

examples

claude helped me find this real-world example:

In the early years of the World Series of Poker Main Event, the field was tiny and repeat champions were common. Doyle Brunson won back-to-back in 1976-77, Johnny Chan in 1987-88, and Stu Ungar won three times total (1980, 1981, and 1997). as the field exploded from a handful of players to nearly 10,000 entrants, and the average skill level rose sharply, repeat winners became exceedingly rare. only four players have won the Main Event more than once across the entire 55-year history, and almost all of those came from the early, low-competition era. in recent times, a new name wins every single year, and the "best" players regularly bust out in the middle rounds.

a more relatable example:

if you've ever been on the job hunt, then you know that the process can be pretty random, especially at competitive tech companies — like FAANG and OpenAI/Anthropic. sometimes, you bomb an interview because the interviewer probed the one gap in your preparation, or the chemistry just wasn't there with ther interviewer, both of which are not an accurate reflection of your talent. but sometimes you also feel like you got lucky, because you happened to get tested on something you were really good at, or you passed an interview even though you thought you did poorly. the paradox of skill tells us that when the competition is fierce, relative skill differences vanish and random factors can play a big role: which interviewer you get, ther order in which you're seen, whether the questions you get match your preparation, etc. unlike sports though, where the outcomes of competitions are readily quantifiable, there's an added source of randomness from the "ruler" itself because interviews vary in their design and they don't have an objective rubric for success.

you might be thinking that my explanation so fits some of the data, but conflicts with the rest. meaning, there are plenty of domains out there where we have observed rising average performance, but the best players still consistently win. think of chess or tennis, where a few notable names (e.g., Magnus Carlsen, Novak Djokovic) dominate the rankings year after year. how do we reconcile this with the paradox of skill? well, there are a few factors at play here:

lower variance: some games, like freethrow shooting or chess, have naturally lower variance. but games like poker or basketball have more randomness baked in, and just like the simulations show, higher variance entails more randomness in the outcome, even when we control for the spread of skill.
repetition: again, referring back to the simulations, you can see that even when the skill spread is compressed and noise is dialed up, there's uncertainty in the outcome of any one contest, but over many contests, the win rates still reflect the underlying skill ranking. so in tennis, even though the skill margin between top players is getting squeezed, the fact that they play many matches against each other allows the law of large numbers to distinguish the best players over time. but, playing repeated contests is not always feasible. e.g. in hiring, each candidate can only practically get a few interviews at any company to prove their skill. and as variance increases, theory suggests that the number of contests needed to reliably identify the best individuals grows exponentially, making it even more difficult in practice. it's in these low-repetition, high-variance regimes where the paradox of skill is most prounounced.
compounding advantages: in some domains, a player can have a small skill advantage or catch a lucky break, and that can compound over time to create a much larger advantage. e.g. college admissions at elite universities is bit of a lottery, but if you get in, then you have a much higher chance of getting a top-tier job out of college (which maybe harder these days with AI, but you get the point), which can lead to more opportunities, and so on.

so the dynamics of competition, including the interplay of skill and luck, is much more nuanced and the paradox of skill is just one of the many nuances.

wrapping up

i want re-emphasize that the paradox of skill is not an explanation of how skill evolves in a field, but rather a consequence of how we evaluate skill. in other words, it's an artifact of the measurement process, and not a property of the individuals.

but the real question is: what do we do with this insight? i don't have a great answer. it's not always possible but i gave you one solution to mitigate randomness which is increasing the number of reps or "shots on goal". but maybe the more useful takeaway is just being aware of the phenomenon, allowing us to be more compassionate with ourselves and others, especially when it comes to losing a competition. although, i still find it hard to apply this lesson in practice. so instead of compassion, i feel the paradox of skill gives me license to be a healthy amount of "delusional", in the sense that i can attribute (some of) my failures to bad luck and the success of others to good luck, implying that success is simply on the other side of trying again.