Our Research
Most baby name sites hand you a list sorted alphabetically and wish you luck. We took a different approach: build a research pipeline that treats naming the way an academic would — with real data, rigorous methodology, and genuine curiosity about why certain names capture a generation's imagination.
We don't just list names — we study how names move through culture.
What we're studying
The Namesake Cultural Diffusion Study asks a question that sounds simple but has never been properly answered: when a character appears in a hit film, a royal baby is born, or an athlete breaks a record, how does that cultural moment ripple through actual baby-naming decisions?
The dominant academic position (Lieberson 2000, Hahn & Bentley 2003) holds that names change through a self-reinforcing fashion cycle — essentially random drift — and that cultural events are post-hoc rationalizations. A competing view says cultural shocks have real, measurable causal effects on naming. Our dataset lets us put both theories on trial with evidence neither side has had access to.
This isn't an abstract exercise. The answers directly improve the recommendations Namesake gives you: understanding which names are riding a cultural wave (and which waves crest quickly) helps parents choose names they'll still love in five years.
Methodology
Our study is organized around five established research frameworks, each contributing a specific analytical component. We chose these because each one produces a concrete, testable number — not just a narrative.
Establishes how much name change happens without any cultural cause — the baseline everything else is measured against.
Names spread by sound, not spelling. The unit of contagion is the onset phoneme, not the name string itself.
Quantifies the "half-life" of a cultural shock and its branching ratio — how long a naming trend echoes.
Separates broadcast-driven adoption (you saw the movie) from peer-driven adoption (your friend named their kid that).
Builds per-event counterfactuals — what would have happened to a name if the cultural event never occurred.
Supporting methods include Granger causality for lead-lag structure between search interest and births, Hill-curve saturation modeling for the “Blockbuster Paradox” (do mega-hits actually produce fewer namesakes?), and survival analysis for name lifespan after a cultural event.
Questions we're answering
- •How much of name turnover is pure fashion drift vs. driven by identifiable cultural events?
- •Does Google search interest actually predict SSA birth registrations — and with what lag?
- •When a cultural shock hits a name, what is its half-life? Does a royal baby produce a longer echo than a hit film?
- •Do blockbuster movies with hyper-exposure actually produce fewer namesakes than moderately popular films?
- •When a name spikes, how much of the new naming mass lands on that exact name vs. its phonetic neighbors?
- •Can we build per-event counterfactuals — what would "Arya" look like if Game of Thrones had never aired?
- •Is there a predictability ceiling for name popularity, and how close can any model get?
Data sources
Our analysis draws on both internal datasets we've built and established external sources. Together they form what we believe is the most comprehensive name-culture dataset assembled for this kind of study.
| Source | What it provides | Scale |
|---|---|---|
| SSA birth registrations | Annual name counts, 1880-2024 | 2.1M rows |
| Google Trends | Weekly search interest for ~25K names | 6.6M rows |
| Cultural event attribution | Spike detection + film/TV/celebrity attribution | Thousands of events |
| CMU Pronouncing Dictionary | Phoneme decomposition (ARPAbet) | 126K entries |
| Phoneme analysis (neural) | g2p_en fallback for names not in CMU | 43K names |
| SSA state-level data | Geographic diffusion patterns | ~6M rows |
| OMDb / TMDb | Film & TV metadata, cast, box office | 22K+ titles |
| Google Books Ngrams | Pre-internet cultural baseline, 1800-2019 | Millions of tokens |
| Wikipedia pageviews | Independent cultural attention signal | 2015-present |
Pipeline status
Our research pipeline is a multi-phase system that moves from raw data acquisition through statistical modeling to final findings. Here's where each phase stands.
Research infrastructure: structured logging, resumable processing, streaming parquet I/O, rate limiting, and database connectivity.
SSA records (2.1M rows), search trends (7.4M weekly observations), spike events (1,335 detected anomalies), and enrichment data exported to versioned parquet.
13 external sources acquired: CMU phonemes (134K), SSA national + state-level (8.1M rows), Google Books Ngrams (5.8M), TMDb titles + cast (5.1K), CDC natality, GDELT mentions, place names, and more. Google Trends fetch ~81% complete.
All 43,334 SSA names decomposed to ARPAbet phonemes — 22% from CMU dictionary, 78% via neural g2p model. This enables 'names that sound like X' analysis.
Annual panel (1.96M rows, 104,819 names, 1880–2024), weekly panel (7.4M rows, 27,577 names), and event panel structure built. 55.6% search coverage, 100% phonetic density.
Lieberson neutral drift + phonetic fashion null models calibrated and validated. Female N_e ≈ 9,850, male N_e ≈ 22,320. 84% of names are drift-consistent; 1.3% show strong cultural influence — the signal we use to classify names.
34M-edge phonetic neighborhood graph built. Phonetic neighbors co-move significantly (r=0.18 vs 0.07 random control, p=1.1×10⁻¹²). 1,221 phonetic clusters identified.
Granger causality complete: search interest predicts births for 21.5% of names (p < 0.05), with film events showing positive effects and news events showing avoidance. Hawkes process and Bass diffusion fits in progress.
Synthetic control counterfactuals for the 200 best-attributed cultural events, plus the Hill-curve Blockbuster Paradox analysis.
What predicts which events drive naming? Event type, name origin, phonetic neighborhood, generation-cycle position — the full Lieberson decomposition.
Spatial diffusion across U.S. states and the Salganik predictability ceiling — how well can any model forecast name popularity?
Key findings so far
Phases 5 through 7a of our analysis are complete: the Lieberson null model, phonetic spillover analysis, and Granger causality tests have been run against 145 years of SSA data and 20 years of Google Trends. Here's what we've found.
When we simulate 145 years of name popularity under pure neutral drift — the Lieberson hypothesis that naming is essentially random fashion cycling — about 2% of names beat the model in any given year. But across their full history, 64.6% of names with 50+ years of data beat neutral drift at the 99th percentile in at least one year. Random drift is real, but it isn't the whole story.
Names like Xander (66% of years beat p99) and Nevaeh (52%) show non-drift-like trajectories across decades. Traditional names with deep cultural roots — Paul, Elizabeth, Robert — also beat the model around 50% of years, because their persistence itself is culturally enforced, not random. Single-event spikes (Khaleesi, Loki, Elden) don't beat the model on this metric — detecting those requires per-event attribution, which is the next phase.
The effective population size for female names (N_e ≈ 9,852) is less than half that of male names (N_e ≈ 22,320). In plain English: the pool of actively competing girl names is smaller, so individual names rise and fall faster. Parents naming daughters are measurably more fashion-sensitive than parents naming sons.
Phonetic neighbors — names differing by one or two sounds — co-move significantly more than random pairs (mean r=0.18 vs 0.07 control, Welch t=7.21, p=1.1×10⁻¹²). When a cultural event drives up one name, its phonetic neighbors gain ~18% of the focal name's correlation strength. We identified 1,221 distinct phonetic clusters across 43,334 names.
Granger causality tests on 4,185 names show that search interest predicts SSA birth registrations for 21.5% of names (p < 0.05), with a median optimal lag of 1 year. Film and celebrity events produce positive impulse responses (+0.0063 at 1-year horizon), while news events produce negative responses — parents actively avoid names in the news.
Panel VAR impulse response functions show film-character names have 2-7× larger positive effects than baseline. News-event names show the opposite: a measurable avoidance effect. This 'aspirational vs cautionary' split is the empirical foundation for how Namesake distinguishes trending names from names merely in the news.
The full threshold computation covers 46,412 names from 1880–2024, generating 1,950,660 per-name-year threshold rows. Every SSA name with at least a decade of history now has an empirical answer to the question 'is this name's trajectory explainable by fashion drift?'
We matched SSA names against 220 years of English fiction via Google Books Ngrams (1800–2019). Names that appeared frequently in published fiction before the internet era were already part of the internal fashion process — this prevents us from falsely attributing centuries-old trends to modern media.
These findings span Phases 5–7a. As the Hawkes process modeling, Bass diffusion, and synthetic control analysis complete, we'll add findings on trend half-lives, broadcast vs. peer-driven adoption, the Blockbuster Paradox, and per-event causal impact.
The strongest non-drift signals
These 24 names (20+ years of SSA history) have the highest share of years beating our neutral-drift null at the 99th percentile. Traditional classics dominate — their decades-long persistence at top ranks isn't what random fashion drift predicts. Click any name to see its full profile.
Note: this metric catches sustained non-drift behavior, not one-off spikes. Names like Khaleesi, Elden, or Loki have brief spikes that are real but too short to beat the model on this measure — those surface through the separate cultural attribution pipeline.
Published reports
A data analysis of 843 cultural events and their impact on baby naming across 145 years of SSA birth records
Data sources, analytical methods, and limitations for the Namesake research pipeline.
Full-scale Phase 5 analysis: 1.95M threshold rows across 46,412 names, 145 years of SSA data. Neutral drift + phonetic fashion null models with Wright-Fisher calibration.
Phase 6: phonetic neighborhood cross-correlation, clusters, Granger tests, and spillover magnitudes vs SSA panel.
Why this matters for parents
This research isn't just academic curiosity — it directly shapes the recommendations you see on Namesake. Understanding the mechanics of name trends means we can tell you things no other baby name site can:
- •Whether a name you love is riding a cultural wave that's likely to crest (or has already peaked).
- •Which names are trending because of genuine, broad cultural shift vs. a single event that will fade.
- •How a name's sound profile connects it to a neighborhood of similar names — and whether that neighborhood is growing or shrinking.
- •The difference between a name that's gaining because everyone saw the same movie and one that's spreading organically through communities.
The Namesake Cultural Diffusion Study is an ongoing research effort. For questions or collaboration inquiries, reach us at hello@namesake.baby.