Skip to main content
← Back to home

Our Research

Data current through 2024·Read our methodology

Most baby name sites hand you a list sorted alphabetically and wish you luck. We took a different approach: build a research pipeline that treats naming the way an academic would — with real data, rigorous methodology, and genuine curiosity about why certain names capture a generation's imagination.

We don't just list names — we study how names move through culture.

Look up any name →

Browse by year →

104,819
Names analyzed
2.1M
Historical records
5.8M
Pre-internet data points
145 yrs
Of SSA data (1880–2024)

What we're studying

The Namesake Cultural Diffusion Study asks a question that sounds simple but has never been properly answered: when a character appears in a hit film, a royal baby is born, or an athlete breaks a record, how does that cultural moment ripple through actual baby-naming decisions?

The dominant academic position (Lieberson 2000, Hahn & Bentley 2003) holds that names change through a self-reinforcing fashion cycle — essentially random drift — and that cultural events are post-hoc rationalizations. A competing view says cultural shocks have real, measurable causal effects on naming. Our dataset lets us put both theories on trial with evidence neither side has had access to.

This isn't an abstract exercise. The answers directly improve the recommendations Namesake gives you: understanding which names are riding a cultural wave (and which waves crest quickly) helps parents choose names they'll still love in five years.

Methodology

Our study is organized around five established research frameworks, each contributing a specific analytical component. We chose these because each one produces a concrete, testable number — not just a narrative.

Neutral drift null modelSociology / population genetics

Establishes how much name change happens without any cultural cause — the baseline everything else is measured against.

Phonetic neighborhoodsPsycholinguistics (Berger et al.)

Names spread by sound, not spelling. The unit of contagion is the onset phoneme, not the name string itself.

Hawkes self-exciting processesSeismology / social cascades

Quantifies the "half-life" of a cultural shock and its branching ratio — how long a naming trend echoes.

Bass diffusion modelMarketing science

Separates broadcast-driven adoption (you saw the movie) from peer-driven adoption (your friend named their kid that).

Synthetic controlsCausal inference / economics

Builds per-event counterfactuals — what would have happened to a name if the cultural event never occurred.

Supporting methods include Granger causality for lead-lag structure between search interest and births, Hill-curve saturation modeling for the “Blockbuster Paradox” (do mega-hits actually produce fewer namesakes?), and survival analysis for name lifespan after a cultural event.

Questions we're answering

  • How much of name turnover is pure fashion drift vs. driven by identifiable cultural events?
  • Does Google search interest actually predict SSA birth registrations — and with what lag?
  • When a cultural shock hits a name, what is its half-life? Does a royal baby produce a longer echo than a hit film?
  • Do blockbuster movies with hyper-exposure actually produce fewer namesakes than moderately popular films?
  • When a name spikes, how much of the new naming mass lands on that exact name vs. its phonetic neighbors?
  • Can we build per-event counterfactuals — what would "Arya" look like if Game of Thrones had never aired?
  • Is there a predictability ceiling for name popularity, and how close can any model get?

Data sources

Our analysis draws on both internal datasets we've built and established external sources. Together they form what we believe is the most comprehensive name-culture dataset assembled for this kind of study.

SourceWhat it providesScale
SSA birth registrationsAnnual name counts, 1880-20242.1M rows
Google TrendsWeekly search interest for ~25K names6.6M rows
Cultural event attributionSpike detection + film/TV/celebrity attributionThousands of events
CMU Pronouncing DictionaryPhoneme decomposition (ARPAbet)126K entries
Phoneme analysis (neural)g2p_en fallback for names not in CMU43K names
SSA state-level dataGeographic diffusion patterns~6M rows
OMDb / TMDbFilm & TV metadata, cast, box office22K+ titles
Google Books NgramsPre-internet cultural baseline, 1800-2019Millions of tokens
Wikipedia pageviewsIndependent cultural attention signal2015-present

Pipeline status

Our research pipeline is a multi-phase system that moves from raw data acquisition through statistical modeling to final findings. Here's where each phase stands.

0
Scaffold and core librariesComplete

Research infrastructure: structured logging, resumable processing, streaming parquet I/O, rate limiting, and database connectivity.

1
Internal data snapshotComplete

SSA records (2.1M rows), search trends (7.4M weekly observations), spike events (1,335 detected anomalies), and enrichment data exported to versioned parquet.

2
External data acquisitionComplete

13 external sources acquired: CMU phonemes (134K), SSA national + state-level (8.1M rows), Google Books Ngrams (5.8M), TMDb titles + cast (5.1K), CDC natality, GDELT mentions, place names, and more. Google Trends fetch ~81% complete.

3
Phonetic decompositionComplete

All 43,334 SSA names decomposed to ARPAbet phonemes — 22% from CMU dictionary, 78% via neural g2p model. This enables 'names that sound like X' analysis.

4
Panel constructionComplete

Annual panel (1.96M rows, 104,819 names, 1880–2024), weekly panel (7.4M rows, 27,577 names), and event panel structure built. 55.6% search coverage, 100% phonetic density.

5
Null model (neutral drift)Complete

Lieberson neutral drift + phonetic fashion null models calibrated and validated. Female N_e ≈ 9,850, male N_e ≈ 22,320. 84% of names are drift-consistent; 1.3% show strong cultural influence — the signal we use to classify names.

6
Phonetic spillover analysisComplete

34M-edge phonetic neighborhood graph built. Phonetic neighbors co-move significantly (r=0.18 vs 0.07 random control, p=1.1×10⁻¹²). 1,221 phonetic clusters identified.

7
Diffusion modelingIn progress

Granger causality complete: search interest predicts births for 21.5% of names (p < 0.05), with film events showing positive effects and news events showing avoidance. Hawkes process and Bass diffusion fits in progress.

8
Causal analysisUpcoming

Synthetic control counterfactuals for the 200 best-attributed cultural events, plus the Hill-curve Blockbuster Paradox analysis.

9
Heterogeneity decompositionUpcoming

What predicts which events drive naming? Event type, name origin, phonetic neighborhood, generation-cycle position — the full Lieberson decomposition.

10
Geographic and predictability analysisUpcoming

Spatial diffusion across U.S. states and the Salganik predictability ceiling — how well can any model forecast name popularity?

Key findings so far

Phases 5 through 7a of our analysis are complete: the Lieberson null model, phonetic spillover analysis, and Granger causality tests have been run against 145 years of SSA data and 20 years of Google Trends. Here's what we've found.

1
Random drift can't explain most well-established names

When we simulate 145 years of name popularity under pure neutral drift — the Lieberson hypothesis that naming is essentially random fashion cycling — about 2% of names beat the model in any given year. But across their full history, 64.6% of names with 50+ years of data beat neutral drift at the 99th percentile in at least one year. Random drift is real, but it isn't the whole story.

2
The null model catches sustained influence, not spikes

Names like Xander (66% of years beat p99) and Nevaeh (52%) show non-drift-like trajectories across decades. Traditional names with deep cultural roots — Paul, Elizabeth, Robert — also beat the model around 50% of years, because their persistence itself is culturally enforced, not random. Single-event spikes (Khaleesi, Loki, Elden) don't beat the model on this metric — detecting those requires per-event attribution, which is the next phase.

3
Girl names change faster than boy names

The effective population size for female names (N_e ≈ 9,852) is less than half that of male names (N_e ≈ 22,320). In plain English: the pool of actively competing girl names is smaller, so individual names rise and fall faster. Parents naming daughters are measurably more fashion-sensitive than parents naming sons.

4
Names spread by sound, not just by spelling

Phonetic neighbors — names differing by one or two sounds — co-move significantly more than random pairs (mean r=0.18 vs 0.07 control, Welch t=7.21, p=1.1×10⁻¹²). When a cultural event drives up one name, its phonetic neighbors gain ~18% of the focal name's correlation strength. We identified 1,221 distinct phonetic clusters across 43,334 names.

5
Google search interest actually predicts baby naming

Granger causality tests on 4,185 names show that search interest predicts SSA birth registrations for 21.5% of names (p < 0.05), with a median optimal lag of 1 year. Film and celebrity events produce positive impulse responses (+0.0063 at 1-year horizon), while news events produce negative responses — parents actively avoid names in the news.

6
Film characters boost naming; news events suppress it

Panel VAR impulse response functions show film-character names have 2-7× larger positive effects than baseline. News-event names show the opposite: a measurable avoidance effect. This 'aspirational vs cautionary' split is the empirical foundation for how Namesake distinguishes trending names from names merely in the news.

7
46,412 names analyzed across 1.95 million data points

The full threshold computation covers 46,412 names from 1880–2024, generating 1,950,660 per-name-year threshold rows. Every SSA name with at least a decade of history now has an empirical answer to the question 'is this name's trajectory explainable by fashion drift?'

8
5.8 million pre-internet data points for context

We matched SSA names against 220 years of English fiction via Google Books Ngrams (1800–2019). Names that appeared frequently in published fiction before the internet era were already part of the internal fashion process — this prevents us from falsely attributing centuries-old trends to modern media.

These findings span Phases 5–7a. As the Hawkes process modeling, Bass diffusion, and synthetic control analysis complete, we'll add findings on trend half-lives, broadcast vs. peer-driven adoption, the Blockbuster Paradox, and per-event causal impact.

The strongest non-drift signals

These 24 names (20+ years of SSA history) have the highest share of years beating our neutral-drift null at the 99th percentile. Traditional classics dominate — their decades-long persistence at top ranks isn't what random fashion drift predicts. Click any name to see its full profile.

Note: this metric catches sustained non-drift behavior, not one-off spikes. Names like Khaleesi, Elden, or Loki have brief spikes that are real but too short to beat the model on this measure — those surface through the separate cultural attribution pipeline.

Published reports

Why this matters for parents

This research isn't just academic curiosity — it directly shapes the recommendations you see on Namesake. Understanding the mechanics of name trends means we can tell you things no other baby name site can:

  • Whether a name you love is riding a cultural wave that's likely to crest (or has already peaked).
  • Which names are trending because of genuine, broad cultural shift vs. a single event that will fade.
  • How a name's sound profile connects it to a neighborhood of similar names — and whether that neighborhood is growing or shrinking.
  • The difference between a name that's gaining because everyone saw the same movie and one that's spreading organically through communities.

The Namesake Cultural Diffusion Study is an ongoing research effort. For questions or collaboration inquiries, reach us at hello@namesake.baby.