The Predictability Ceiling
Salganik-style forecasting exercise: how well can any model predict which names enter the SSA top 100?
5.11 The Predictability Ceiling
Following Salganik et al. (MusicLab, 2006), we ask: how predictable is baby name success? Models are trained on 2004-2014 data and tested on whether a name entered the SSA top 100 during 2015-2024.
| Model | AUC | PR-AUC | Brier | P@50 | P@100 | P@200 |
|---|---|---|---|---|---|---|
| Baseline A: Top-500 rule | 0.982 | 0.199 | 0.018 | 0.200 | 0.230 | 0.210 |
| Baseline B: AR(1) rank | 0.997 | 0.745 | 0.003 | 1.000 | 0.980 | 0.770 |
| Full: Logistic Regression | 0.999 | 0.881 | 0.008 | 0.980 | 0.980 | 0.900 |
Interpretation
The best model achieves AUC = 0.999, suggesting surprisingly high predictability. This may reflect that structural features (prior popularity, phonetic properties) constrain the possibility space more than Salganik's framework would predict.
The full model improves over the simple top-500 baseline by 0.017 AUC points, suggesting that cultural, phonetic, and adoption-dynamic features carry real predictive power beyond mere historical popularity.
Training: 58409 names from 2004-2014. Positive class: 271 names that entered top 100 during 2015-2024 (0.5% base rate).