The MAX factor, out of sample, 2016 to 2026
The original claim is that the lowest-MAX decile beats the highest-MAX decile by over 1% per month. On 122 months of US equity data, value-weighted with NYSE breakpoints and stocks-only filtering, the spread is not merely absent but inverted: high-MAX beats low-MAX by 1.80% per month with t = -2.57. The finding is robust across three equal-weighted sensitivities with t-statistics from -1.81 to -4.12. Backfill update 2026-04-11 confirms the inversion survives the Newey-West 12-lag HAC adjustment at t = -2.18, and is concentrated in the 2016-2021 first half (t = -2.29) versus the 2021-2026 second half (t = -1.38).
Source paper
Bali, Cakici, Whitelaw (2011) "Maxing Out: Stocks as Lotteries and the Cross-Section of Expected Returns"
The claim
Bali, Cakici, and Whitelaw (2011) published one of the most cited papers in the behavioral-finance-meets-cross-section literature. They proposed that investors overpay for stocks with recent lottery-like payoffs, so stocks with the highest maximum daily return in the prior month (the MAX factor) should go on to underperform. The abstract states the result verbatim:
“Average raw and risk-adjusted return differences between stocks in the lowest and highest MAX deciles exceed 1% per month.”
Their sample was July 1962 to December 2005 on the NYSE, AMEX, and NASDAQ universe (Bali, Cakici, Whitelaw 2011).
What we tested
Twenty-one years of different market structure separates the end of their sample from ours. Zero rates, retail trading booms, the meme-stock episode, COVID, and the rise of zero-day-to-expiry options have all happened in between. The out-of-sample question is blunt: does the MAX anomaly still work in the post-2015 US equity market, in the same direction, with anything close to the same magnitude?
Sample
- 2016-01-05 to 2026-04-09 (~10 years, 122 usable formation months)
- 5,568 US equities in the raw price cache, merged with 8,479 rows of company profile data on 6,293 stocks-only candidates after dropping ETFs, funds, and non-main exchange listings (OTC, CBOE, PNK)
- Daily OHLCV, close-to-close returns
Methodology
- Compute each stock’s MAX as the maximum daily simple return within a calendar month
- At the end of each month, sort stocks into deciles by that month’s MAX
- Hold each decile portfolio for one month and rebalance at the next month end
- The long-short factor return is decile 1 (lowest MAX) minus decile 10 (highest MAX)
- Report mean, standard deviation, and t-statistic of the monthly long-short time series
Four specifications reported in full, with the primary chosen as the closest analog to the original paper that the available data supports:
- PRIMARY (value-weighted, NYSE breakpoints, stocks-only, winsorized). Stocks-only means ETFs, funds, and OTC/CBOE/PNK listings are excluded via the FMP company profile flags. NYSE breakpoints means each month’s decile cutoffs on MAX are computed using ONLY NYSE-listed stocks, then the full NYSE-plus-NASDAQ-plus-AMEX universe is assigned to deciles using those cutoffs. Value-weighted means each decile’s mean next-month return uses the FMP snapshot marketCap as the weight. Individual stock next-month returns are winsorized at [-0.90, +1.50] to neutralize a handful of extreme single-stock observations (biotech approvals, meme frenzies, and suspected split artifacts) that would otherwise distort specific months.
- Sensitivity A (equal-weighted, filtered, winsorized). Equal-weighted, universe-breakpoint, with the price ≥ $5 and dollar volume ≥ $1M filter and the same winsorization as primary.
- Sensitivity B (equal-weighted, filtered, no winsorization). Same filter as A without winsorization. Shows how much of the effect survives if the extreme single-stock tails are left in.
- Sensitivity C (equal-weighted, no filter, no winsorization). The rawest specification. No filter, no winsorization, equal-weighted, universe breakpoints. Maximum microcap noise.
Pre-registered verdict thresholds, committed before the script was first run:
- Replicated: mean(D1 - D10) > 0.007 and t > 2
- Degraded: 0 < mean ≤ 0.007 and t > 2
- Failed: mean ≤ 0 or t ≤ 2
- Inconclusive: a data quality issue prevents a clean call
The 0.007 floor is 70% of the original >1% claim.
Disclosed departures from a strict CRSP-quality replication
- The value-weighting uses a snapshot market cap from the FMP company profile dump, not a time-varying monthly market cap series. Over a 10-year sample, the cross-sectional ordering of market caps is approximately stable (large caps stay large), but the absolute weights drift. A time-varying VW would be stricter, and is the next robustness improvement we would make if this spec survived.
- Exchange listing is also a snapshot. A stock that moved between NASDAQ and NYSE during the sample is treated as fixed at its current exchange.
- A handful of observations in the raw cache are implausibly extreme and some are likely unadjusted split artifacts, which is why the winsorization is applied in the primary. We show both winsorized and unwinsorized sensitivities so the reader can see how much of the effect depends on this choice.
- The universe is the operator’s Numerai-aligned cache, not the full CRSP NYSE/AMEX/NASDAQ footprint. It is biased toward names that clear a minimum data-availability bar. After the stocks-only filter, the primary spec sorts a median of 3,067 stocks per month.
The numbers
Primary specification
Value-weighted, NYSE breakpoints, stocks-only, winsorized individual monthly returns.
| Metric | Value |
|---|---|
| Sample months | 122 |
| Median stocks / month | 3,067 |
| D1 (low MAX) mean monthly return | +1.193% |
| D10 (high MAX) mean monthly return | +2.994% |
| D1 minus D10 mean monthly | -1.801% |
| D1 minus D10 t-statistic | -2.57 |
| Annualized Sharpe of D1 minus D10 | -0.81 |
| Worst month | -27.48% |
| Best month | +14.24% |
The pre-registered call: mean ≤ 0, so the verdict is FAILED. The claim that the lowest-MAX decile beats the highest-MAX decile by more than 1% per month is not supported. The sign is inverted and the inversion clears the conventional |t| > 2 significance bar.
Sensitivity A (equal-weighted, filtered, winsorized)
| Metric | Value |
|---|---|
| Median stocks / month | 2,507 |
| D1 minus D10 mean monthly | -2.113% |
| D1 minus D10 t-statistic | -3.95 |
This was the initial primary at first publication. The inversion is larger in equal-weighted form, as expected, because equal-weighting upweights the small-cap end of the decile where the effect is most extreme.
Sensitivity B (equal-weighted, filtered, no winsorization)
| Metric | Value |
|---|---|
| D1 minus D10 mean monthly | -4.849% |
| D1 minus D10 t-statistic | -4.12 |
| Worst month | -114.81% |
The unwinsorized spec is distorted by extreme single-stock tails in specific months (the -114% month is the signature of a decile bucket containing a stock with an unadjusted split-like return), which is exactly why the primary winsorizes. The direction of the effect is unchanged.
Sensitivity C (equal-weighted, no filter, no winsorization)
| Metric | Value |
|---|---|
| Median stocks / month | 3,462 |
| D1 minus D10 mean monthly | -1.824% |
| D1 minus D10 t-statistic | -1.81 |
On the fully raw universe, the point estimate is still negative and close to the primary in magnitude, but the t-stat drops below the |2| bar because microcap tail noise inflates the standard error. The mean is still negative, so the rubric still returns FAILED, and the result is consistent across all four specifications.
What this means
The original Bali, Cakici, Whitelaw 2011 paper reported that in 1962 to 2005, investors overpaid for stocks that had recently delivered lottery-like payoffs and were compensated with negative subsequent returns, to the tune of more than 1% per month on the decile spread. In the 2016 to 2026 US equity market we tested, under the closest analog to the original methodology the available data supports, the effect has inverted. Stocks with the highest MAX in month t have, on average, outperformed stocks with the lowest MAX in month t+1 by roughly 1.8 percentage points per month on the value-weighted NYSE-breakpoint primary, and by between 1.8 and 4.8 percentage points on the equal-weighted sensitivities. The t-statistic of the primary is -2.57, above the conventional |2| significance bar and below the stricter |3| hurdle that Harvey, Liu, and Zhu (2016) advocate for a multi-tested literature.
The important thing about this round is that the inversion is not a microcap artifact. Value-weighting the decile portfolios with NYSE-computed breakpoints was the obvious rebuttal to the initial equal-weighted result, and the effect survives it. Magnitude shrinks, as expected, but the sign and the significance both hold.
We flag three plausible drivers, none of which this run can pin down definitively.
- Retail flow. The rise of commission-free retail trading, fractional shares, and option-buying onto meme-like names has plausibly shifted the marginal price-setter in high-MAX stocks. If retail has been a persistent net buyer of lottery names at scale, the anomaly the original paper identified as compensation to lottery-averse investors could mechanically flip.
- Residual methodology gap. Our VW weights are snapshot marketCaps from the FMP profile dump, not time-varying. A strict time-varying VW on CRSP-quality data is the next robustness test, and it is the one piece of methodology still standing between this run and a genuinely like-for-like replication. We commit to adding it in a future update.
- Regime change. The 2016 to 2026 window includes zero rates (2016 to 2022), a brief tightening cycle, COVID, the retail trading boom, and a persistent bull market in US growth names. Many published US equity factors have documented decay or reversal in similar windows. A sign flip is not unique to MAX.
By the pre-registered rubric, this is a failed replication. The additional finding that the sign has inverted survives the standard value-weighted NYSE-breakpoint rebuttal at conventional significance. The archive will track whether a strict time-varying VW on CRSP-quality data eventually brings the original sign back.
Update — 2026-04-11 — Value-weighted NYSE-breakpoint primary added
This verdict was first published on 2026-04-11 with an equal-weighted universe-breakpoint primary specification and a disclosed next step: “A value-weighted NYSE-breakpoint run on CRSP-quality data reverses the sign back to positive and significant” would change the call. Later the same day, the FMP company profile dump was located in the operator’s databank with exchange listing, market cap, and ETF/fund flags, which made a VW NYSE-breakpoint spec tractable without any new API ingestion. It was added as the new primary, and the original equal-weighted spec was demoted to Sensitivity A.
Outcome of the robustness test. The value-weighted NYSE-breakpoint primary also returns FAILED with mean -1.801% per month and t = -2.57. The headline verdict does not change. The magnitude shrinks from -2.113% (equal-weighted) to -1.801% (value-weighted), which is expected, and the inversion is no longer a microcap story. The snapshot nature of the FMP market cap is the remaining methodological gap between this and a strict CRSP-quality replication, and is the pre-committed next step if we revisit this verdict again.
Reproducibility
The replication is a single Python file with no custom dependencies beyond pandas and numpy. It reads the operator’s pre-pickled 10-year daily OHLCV cache and a parquet of FMP company profiles, computes monthly MAX and next-month returns per symbol, merges exchange and market cap, forms NYSE-breakpoint deciles, and runs all four specifications. Total runtime on a laptop is about 23 seconds.
- Script:
scripts/verdicts/bali_cakici_whitelaw_2011_max.py(Nullberg repository) - Results JSON:
scripts/verdicts/bali_cakici_whitelaw_2011_max.results.json - Monthly long-short series CSVs:
bali_cakici_whitelaw_2011_max.monthly_ls_primary.csv(VW NYSE),..._sensA.csv,..._sensB.csv,..._sensC.csv
A public GitHub mirror with the replication notebooks is being set up. In the interim the files are committed to the Nullberg site repository and will be moved to the public repo on the same path.
What we will track from here
This verdict enters the living archive as failed and stays there until at least one of the following happens. If it does, the entry is updated, a dated changelog is appended, and the old call is kept visible.
- A strict time-varying value-weighted run on CRSP-quality data with properly updated monthly market caps reverses the sign.
- The sign flips again in forward months as the universe updates.
- Additional robustness tests (sub-sample stability, industry-controlled, volatility-controlled, size-decile-controlled) materially change the conclusion.
Update — 2026-04-11 — Newey-West and sub-sample stability backfill
The Four verdicts, four shapes primer committed to reporting Newey-West HAC standard errors and sub-sample stability on every verdict going forward. This section applies the same analysis to the MAX verdict’s VW NYSE-breakpoint primary monthly long-short series (bali_cakici_whitelaw_2011_max.monthly_ls_primary.csv). No replication was re-run; the numbers below come from the monthly series produced by the original run.
Newey-West robustness
The Newey-West 1987 HAC long-run variance estimator with a Bartlett kernel and 12 lags (the standard window for monthly factor data) gives the following for the VW NYSE-breakpoint primary:
| Metric | Value |
|---|---|
| Sample months | 122 |
| Mean monthly D1 - D10 | -1.801% |
| i.i.d. t-statistic | -2.57 |
| Newey-West 12-lag t-statistic | -2.18 |
The Newey-West adjustment raises the standard error by approximately 18%, which shrinks |t| from 2.57 to 2.18. The sign inversion still clears |t| > 2 under the stricter HAC standard. The inversion is robust to the autocorrelation adjustment.
Sub-sample stability
Splitting the 122-month sample at its midpoint:
| Half | Months | First month | Last month | Mean monthly | i.i.d. t |
|---|---|---|---|---|---|
| First half | 61 | 2016-01 | 2021-01 | -2.162% | -2.29 |
| Second half | 61 | 2021-02 | 2026-02 | -1.440% | -1.38 |
The inversion is concentrated in the first half of the sample. 2016-01 through 2021-01 shows a significant negative spread (|t| > 2), and the 2021-02 through 2026-02 second half shows a smaller magnitude and a t-statistic that does not reach the significance bar. The first half corresponds to the late-cycle growth-stock dominance regime leading into COVID; the second half includes the 2022 rate-shock repricing and the subsequent partial normalization. The inversion shows up more strongly in the growth-heavy first half, which is consistent with the hypothesis that retail lottery-seeking was strongest in the zero-rate era.
Verdict impact
No change. The pre-registered rubric was mean ≤ 0 OR t ≤ 2 → FAILED. The primary mean is -1.801% (≤ 0), so the verdict would be FAILED regardless of the t-statistic. The Newey-West adjustment does not change that. What the backfill adds is evidence that the failure is robust (it holds under HAC) and regime-concentrated (it was stronger in the first half of the sample than the second). A reader deciding whether to put weight on the inversion as a forward-looking claim should note that the magnitude is smaller in the more recent half, which is consistent with the regime hypothesis in the synthesis primer but does not by itself distinguish between regime change and mean-reversion.
What changed in the archive’s process
This is the first example of the retrospective Newey-West and sub-sample backfill committed in the Four verdicts, four shapes primer. Momentum and profitability were backfilled in the same pass. Going forward, every new verdict will include these diagnostics at first publication, not as a retrospective update.
Bibliography
- Bali, Turan G., Nusret Cakici, and Robert F. Whitelaw. “Maxing Out: Stocks as Lotteries and the Cross-Section of Expected Returns.” Journal of Financial Economics 99(2), 2011, pp. 427-446. Paper
- Newey, Whitney K., and Kenneth D. West. “A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econometrica 55(3), 1987, pp. 703-708. The HAC standard error estimator used for the Newey-West t-statistic in this update.