The Wrong Kind of Momentum

Author

Flynn Presence

It is the fifth set of the 2023 Wimbledon final. Carlos Alcaraz, twenty years old, is serving for the biggest title of his life. Novak Djokovic, 23-time Grand Slam champion, stands across the net. The stadium is unanimous: something has shifted. The momentum, every pundit agrees, belongs to Alcaraz now.

He went on to win. The narrative wrote itself.

But did winning under that kind of pressure make the next points more likely to go his way? Or did fifteen thousand people in the stands fall for one of sport’s oldest statistical traps?

Gilovich et al. (1985) famously dismissed the “hot hand” (the belief that a player’s chance of success is greater following a previous success than following a failure) as a “cognitive illusion.” However, Miller and Sanjurjo (2018) proved this analysis contained a streak selection bias (a conditional-probability artefact that drags estimates of momentum below their true value in short sequences). In tennis, recent Cumulative Sum (CUSUM)-based studies like Du et al. (2025) find strong statistical evidence of momentum, but rely on thin 31-match samples measuring correlation, not cause.

This project scales the test. I analysed 56,253 points from volunteer-charted matches across all four 2023 Grand Slams and both professional tours (the men’s ATP and women’s WTA), a sample significantly larger than any recent study. The results suggest that much of what commentators call momentum after a break point may instead emerge from the structural serving advantage on the following point.

The Data: Four Grand Slams, Two Tours

56,253

Points Analysed

297

Grand Slam Matches

156

Professional Players

Grand Slams · 2 Tours

The raw data comes from Jeff Sackmann’s Match Charting Project, a volunteer-compiled database where tennis analysts manually record every point and exact score state. While recent momentum research like Du et al. (2025) has analysed smaller samples (31 matches from 2023 Wimbledon Men’s Singles), this project uses the much larger Match Charting Project corpus across all four 2023 Grand Slams and both tours; the contribution here is scale and the break point vs tiebreak distinction. Break points (where the receiving player is one point away from winning the game) and tiebreaks (a high-stakes, point-by-point game used to decide a tied set) are the high-leverage moments: genuinely rare, making up roughly 12% of ATP points and 13% of WTA points.

The dataset covers 38,488 ATP points (169 matches, best-of-five sets) and 17,765 WTA points (128 matches, best-of-three). Official rankings control for baseline skill. I kept the tours separate, since pooling would have obscured the comparison that matters most.

Does Momentum Even Show Up in the Data?

Does winning a point make the next one more likely to go your way?

To find out, I ran two tests for 78 ATP and 78 WTA players separately: a Chi-squared test (is a point independent of the last?) and a Wald-Wolfowitz runs test (do hot streaks beat a coin flip?).

Figure 1 summarises per-player test results across all 156 players. Uncorrected, 14.1% of ATP players (Chi-squared) and 15.4% (runs test) showed non-random sequences, above the 5% random baseline, which initially appears to support momentum. WTA figures were lower at 5.1% and 7.7%, barely clearing the baseline. After applying a Benjamini-Hochberg correction to filter out statistical false alarms, the signal nearly disappears. Across all 156 players, only one ATP player survives the runs test and none survive the Chi-squared test; one WTA player survives both tests. Adjusted, evidence for individual momentum largely evaporates.

But I cannot stop here. Miller and Sanjurjo (2018) showed that traditional methods on short sequences are systematically biased against finding momentum, suggesting these tests may not have the resolution to detect more subtle patterns.

Figure 2 tracks the Cumulative Sum (CUSUM) score across the longest WTA match, a running tally of overperformance. When a player wins points at a higher rate than their match average, the line climbs. Genuine momentum would look like sustained directional movement.

Instead, Alexandrova performs above her average almost continuously. The line stays positive for 88 points, dips briefly, then sharply recovers to finish above +4. A momentum-based reading of this chart, in the spirit of Du et al.’s (2025) CUSUM framework, would register these fluctuations as shifting momentum. It is simply a player performing consistently well, interrupted by a single brief slump. That looks less like momentum and more like ordinary match variance. This single match is illustrative, not a formal test.

What About the Biggest Moments?

To test pressure situations, I calculated each player’s Tiebreak Over-Expectation (TBOE): whether they perform above or below their own match average during tiebreaks. If players consistently raise their game, I would expect outliers pulling sharply away from zero.

To prevent small-sample noise from creating artificial outliers, Figure 3 only plots players with at least 20 tiebreak points (28 ATP, 7 WTA). In the ATP, scores cluster tightly between -0.14 and +0.11; no one sharply separates from the pack. The WTA sample is too small for strong conclusions: one player sits visibly apart at the bottom, though with only 7 qualifiers the spread likely reflects the limited sample rather than genuine clutch ability.

Crucially, TBOE only captures an overall tiebreak tendency. It cannot answer the real question: does winning a specific pressure point cause the next one to go the same way?

Correlation or Causation?

Everything so far tests correlation. But a dominating player naturally wins more break points; the association might simply reflect that good players win more of everything, not that winning the big point caused the subsequent success.

To separate correlation from causation, I used a causal forest (an advanced machine learning model). Unlike regression methods that estimate a single average effect, it allows the causal effect to vary across player subgroups. The model relies on one crucial assumption: once I adjust for a player’s skill, form, and match trajectory, no unobserved factor simultaneously affects both the pressure point and the following point.

Consistent with the literature on point-to-point dependence (Gilovich et al., 1985; Miller and Sanjurjo, 2018), a high-leverage win is a won break point or tiebreak point. The baseline comprises all other match points rather than lost pressure points, to keep a consistent comparison group across both leverage types, given that lost-pressure-point counterfactuals introduce different complications in break-point versus tiebreak contexts. To isolate the causal effect beyond existing skill and form, the model holds four variables constant: official player ranking, rolling win percentage (last ten points), short-term winning streak (last four points), and the CUSUM match tracker.

(Full model parameters, subsampling limits, clustered standard errors, and VIF checks are detailed in the project README on GitHub.)

Figure 4 shows the estimated difference in next-point win probability between high-leverage point wins and the rest of the dataset (CATE: the individual effect estimated per observation), holding skill, form, and momentum constant. The estimated effect is generally positive across ranking bands, though Figure 6 later shows this signal is largely driven by the structural serving advantage following break-point wins.

Figure 5 breaks this down by variable: the break point row sits right of zero in both tours; rolling win percentage runs in opposite directions between tours (negative ATP, positive WTA), suggesting recent form plays a different role across tours; and baseline skill alone sits close to zero in both.

The combined estimated effect is positive for both tours (ATP: +0.0957, WTA: +0.0532). If I stopped here, the picture would suggest that winning any high-leverage moment provides a consistent positive boost. But this combined number treats break points and tiebreaks as identical. What happens when I split them apart?

The Finding That Changes Everything

Figure 6 opens with the combined effect; clicking “Split by Pressure Type” reveals the break-point and tiebreak estimates separately. The next-point win probability following a won break point is around 15 percentage points higher in the ATP and 6 in the WTA, holding skill, form, and momentum constant. But this positive signal is not evidence of psychological momentum. The evidence points instead to a structural feature of how tennis scoring works.

Whether the server saves it or the returner converts it, the next point systematically favours the break-point winner through the serving structure. I verified this empirically: across all 1,497 ATP break point wins, the BP winner served the next point 100% of the time. Since servers naturally win 60–65% of points, this built-in serving advantage likely accounts for much of the boost.

In tiebreaks, where the serving advantage is largely neutralised by alternating serve structure, the ATP momentum signal disappears. Serve alternates every two points, so winning a tiebreak point confers no clear advantage on the next. In the ATP, the estimated effect is statistically indistinguishable from zero (+0.0003, SE = 0.0016). In the WTA, the estimated effect is negative (-0.0564, SE = 0.0086); the WTA sample is smaller (209 treated observations), the magnitude should be treated with caution, and the standard error is a CLT approximation: individual CATE estimates are not independent draws, so the confidence interval boundary should be read as indicative rather than definitive.

Because break points are far more common than tiebreaks, the earlier combined estimates are dominated by the serving-transition signal (the structural advantage the break-point winner gets when serve order shifts in their favour on the next point), obscuring how flat the tiebreak effect actually is.

If Not Momentum, Then What?

Beyond the serving mechanism, what predicts how large the estimated effect is across different players and situations? Figure 7 ranks the model’s control variables by their predictive weight.

The cumulative match tracker (CUSUM) dominates. Across both tours, accumulated match drift (what CUSUM captures: the cumulative gap between actual and average performance) accounts for around 56% of the model’s weight in both tours, with baseline player rank and rolling win percentage accounting for most of the remainder.

Crucially, Winning Streak (whether a player won their last four consecutive points) scores a virtually flat 1% to 2%. This means that whether a player is on a hot streak does not meaningfully change how large the estimated effect of a break-point win is, a distinct question from whether streaks predict outcomes in general, which the independence tests above address directly. Within the causal forest, streak status adds almost nothing once match drift is accounted for.

What most determines how winning a high-leverage point affects a player is accumulated match drift, not short-term bursts of form.

The Uncomfortable Conclusion

The evidence suggests that the long-running “hot hand” debate in sport has been asking an imprecise question, at least in tennis, clouded by a serving advantage mistaken for psychology.

The estimated gap between break-point wins and the rest of the dataset is real and consistent. But the evidence points toward a mechanical explanation rather than a psychological one. Because break points occur far more often than tiebreaks, pooled analyses are dominated by this serving-transition signal. When that serving advantage is largely neutralised in tiebreaks, the ATP signal disappears, while the WTA estimate turns negative, though the smaller WTA tiebreak sample and approximate inference warrant caution.

(Note: The combined and break-point ATEs remain stable across an alternative model specification (ranking controls only). This dataset naturally skews toward elite matches and cannot measure unobserved physiological factors like fatigue; whether momentum compounds across longer sequences remains open for future research. Separating the psychological component from the structural serving advantage would require subgroup analysis beyond the scope of this study. A natural extension would test whether apparent momentum patterns persist under Monte Carlo simulations preserving serve order and player win rates, directly asking whether the observed patterns could emerge from match structure alone.)

The next time a commentator says Carlos Alcaraz has the momentum after converting a break point, the data offers a different reading: what he has is the serve. And if he just won a tiebreak point instead? The ATP data finds no signal. That, not the break point, is where the question of psychological momentum is most cleanly answered.

Data · Jeff Sackmann Match Charting Project and ATP/WTA official results (2023).

Methodology · Logistic regression and heterogeneous-effects causal forest (estimating how effects vary across player types) via econml, controlling for player ranking, rolling win percentage, winning streak length, and CUSUM. Full replication pipeline available on GitHub.

Module · BEE2041 Data Science in Economics · University of Exeter.

References

Du, C., Zhang, C. and Zhou, L. (2025). A novel methodological framework for analyzing the momentum effect in tennis singles. [online] Available at: https://arxiv.org/abs/2509.01243 [Accessed 20 Apr. 2026].
Gilovich, T., Vallone, R. and Tversky, A. (1985). The hot hand in basketball: On the misperception of random sequences. Cognitive Psychology, [online] 17(3), pp.295–314. doi:https://doi.org/10.1016/0010-0285(85)90010-6.
Miller, J.B. and Sanjurjo, A. (2018). Surprised by the Hot Hand Fallacy? A Truth in the Law of Small Numbers. OSF Preprints. doi:https://doi.org/10.31219/osf.io/sv9x2.
Sackmann, J. (2023). The Match Charting Project. [online] GitHub. Available at: https://github.com/JeffSackmann/tennis_MatchChartingProject [Accessed 20 Apr. 2026].