Our Research

Data current through 2024·Read our methodology

Most baby name sites hand you a list sorted alphabetically and wish you luck. We took a different approach: build a research pipeline that treats naming the way an academic would — with real data, rigorous methodology, and genuine curiosity about why certain names capture a generation's imagination.

We don't just list names — we study how names move through culture.

Look up any name →

Browse by year →

Read the paper, reviews & technical reports →

104,819

Names analyzed

2.1M

Historical records

5.8M

Pre-internet data points

145 yrs

Of SSA data (1880–2024)

What we're studying

The Namesake Cultural Diffusion Study asks a question that sounds simple but has never been properly answered: when a character appears in a hit film, a royal baby is born, or an athlete breaks a record, how does that cultural moment ripple through actual baby-naming decisions?

The dominant academic position (Lieberson 2000, Hahn & Bentley 2003) holds that names change through a self-reinforcing fashion cycle — essentially random drift — and that cultural events are post-hoc rationalizations. A competing view says cultural shocks have real, measurable causal effects on naming. Our dataset lets us put both theories on trial with evidence neither side has had access to.

This isn't an abstract exercise. The answers directly improve the recommendations Namesake gives you: understanding which names are riding a cultural wave (and which waves crest quickly) helps parents choose names they'll still love in five years.

Methodology

Our study is organized around five established research frameworks, each contributing a specific analytical component. We chose these because each one produces a concrete, testable number — not just a narrative.

Neutral drift null modelSociology / population genetics

Establishes how much name change happens without any cultural cause — the baseline everything else is measured against.

Phonetic neighborhoodsPsycholinguistics (Berger et al.)

Names spread by sound, not spelling. The unit of contagion is the onset phoneme, not the name string itself.

Hawkes self-exciting processesSeismology / social cascades

Quantifies the "half-life" of a cultural shock and its branching ratio — how long a naming trend echoes.

Bass diffusion modelMarketing science

Separates broadcast-driven adoption (you saw the movie) from peer-driven adoption (your friend named their kid that).

Synthetic controlsCausal inference / economics

Builds per-event counterfactuals — what would have happened to a name if the cultural event never occurred.

Supporting methods include Granger causality for lead-lag structure between search interest and births, Hill-curve saturation modeling for the “Blockbuster Paradox” (do mega-hits actually produce fewer namesakes?), and survival analysis for name lifespan after a cultural event.

Questions we're answering

•How much of name turnover is pure fashion drift vs. driven by identifiable cultural events?
•Does Google search interest actually predict SSA birth registrations — and with what lag?
•When a cultural shock hits a name, what is its half-life? Does a royal baby produce a longer echo than a hit film?
•Do blockbuster movies with hyper-exposure actually produce fewer namesakes than moderately popular films?
•When a name spikes, how much of the new naming mass lands on that exact name vs. its phonetic neighbors?
•Can we build per-event counterfactuals — what would "Arya" look like if Game of Thrones had never aired?
•Is there a predictability ceiling for name popularity, and how close can any model get?

Data sources

Our analysis draws on both internal datasets we've built and established external sources. Together they form what we believe is the most comprehensive name-culture dataset assembled for this kind of study.

Source	What it provides	Scale
SSA birth registrations	Annual name counts, 1880-2024	2.1M rows
Google Trends	Weekly search interest for ~25K names	6.6M rows
Cultural event attribution	Spike detection + film/TV/celebrity attribution	Thousands of events
CMU Pronouncing Dictionary	Phoneme decomposition (ARPAbet)	126K entries
Phoneme analysis (neural)	g2p_en fallback for names not in CMU	43K names
SSA state-level data	Geographic diffusion patterns	~6M rows
OMDb / TMDb	Film & TV metadata, cast, box office	22K+ titles
Google Books Ngrams	Pre-internet cultural baseline, 1800-2019	Millions of tokens
Wikipedia pageviews	Independent cultural attention signal	2015-present

Pipeline status

Our research pipeline is a multi-phase system that moves from raw data acquisition through statistical modeling to final findings. Here's where each phase stands.

Scaffold and core librariesComplete

Research infrastructure: structured logging, resumable processing, streaming parquet I/O, rate limiting, and database connectivity.

Internal data snapshotComplete

SSA records (2.1M rows), search trends (7.4M weekly observations), spike events (1,335 detected anomalies), and enrichment data exported to versioned parquet.

External data acquisitionComplete

13 external sources acquired: CMU phonemes (134K), SSA national + state-level (8.1M rows), Google Books Ngrams (5.8M), TMDb titles + cast (5.1K), CDC natality, GDELT mentions, place names, and more. Google Trends fetch ~81% complete.

Phonetic decompositionComplete

All 43,334 SSA names decomposed to ARPAbet phonemes — 22% from CMU dictionary, 78% via neural g2p model. This enables 'names that sound like X' analysis.

Panel constructionComplete

Annual panel (1.96M rows, 104,819 names, 1880–2024), weekly panel (7.4M rows, 27,577 names), and event panel structure built. 55.6% search coverage, 100% phonetic density.

Null model (neutral drift)Complete

Lieberson neutral drift + phonetic fashion null models calibrated and validated. Female N_e ≈ 9,850, male N_e ≈ 22,320. 84% of names are drift-consistent; 1.3% show strong cultural influence — the signal we use to classify names.

Phonetic spillover analysisComplete

34M-edge phonetic neighborhood graph built. Phonetic neighbors co-move significantly (r=0.18 vs 0.07 random control, p=1.1×10⁻¹²). 1,221 phonetic clusters identified.

Diffusion modelingComplete

Granger causality: of 7,530 names with numerically valid tests, search interest predicts births for 19.1% (p < 0.05), film events showing positive impulse responses and news events showing avoidance. Hawkes half-lives (median 1.4 weeks) and Bass diffusion (peer imitation dominates broadcast) fit.

Causal analysisComplete

Synthetic-control counterfactuals for the 200 best-attributed events. We report per-event synthetic-control-adjusted divergences, not pooled causal effects; only 24 of 200 clear a strict causal bar (placebo p < 0.10 and post/pre-MSPE ratio > 5). The Hill-curve Blockbuster Paradox is not supported in this sample.

Heterogeneity decompositionComplete

Lieberson variance decomposition over the divergence panel: event characteristics and name-intrinsic features each explain only about 1% of cross-event variance — comparable, not order-of-magnitude separated. The decomposition's DV is a model output, so it is descriptive, not a headline.

Geographic and predictability analysisComplete

Spatial diffusion across U.S. states (Moran's I weakening in the streaming era) and a Salganik-style predictability exercise: prior-year rank is the dominant predictor of next-year rank; cultural and phonetic features add a small but real edge (GBT P@25 = 0.96 vs AR(1) 0.12).

Key findings so far

Phases 5 through 11 of our analysis are complete: the Lieberson null model, phonetic spillover, Granger causality, the Hawkes and Bass time-series fits, the synthetic-control divergences, the variance decomposition, and the geographic and predictability work have all been run against 145 years of SSA data and 20 years of Google Trends. Here's what we've found.

Random drift can't explain most well-established names

When we simulate 145 years of name popularity under pure neutral drift — the Lieberson hypothesis that naming is essentially random fashion cycling — about 2% of names beat the model in any given year. But across their full history, 64.6% of names with 50+ years of data beat neutral drift at the 99th percentile in at least one year. Random drift is real, but it isn't the whole story.

The null model catches sustained influence, not spikes

Names like Xander (66% of years beat p99) and Nevaeh (52%) show non-drift-like trajectories across decades. Traditional names with deep cultural roots — Paul, Elizabeth, Robert — also beat the model around 50% of years, because their persistence itself is culturally enforced, not random. Single-event spikes (Khaleesi, Loki, Elden) don't beat the model on this metric — detecting those requires per-event attribution, which is the next phase.

Girl names change faster than boy names

The effective population size for female names (N_e ≈ 9,852) is less than half that of male names (N_e ≈ 22,320). In plain English: the pool of actively competing girl names is smaller, so individual names rise and fall faster. Parents naming daughters are measurably more fashion-sensitive than parents naming sons.

Names spread by sound, not just by spelling

Phonetic neighbors — names differing by one or two sounds — co-move significantly more than random pairs (mean r=0.18 vs 0.07 control, Welch t=7.21, p=1.1×10⁻¹²). When a cultural event drives up one name, its phonetic neighbors gain ~18% of the focal name's correlation strength. We identified 1,221 distinct phonetic clusters across 43,334 names.

Google search interest actually predicts baby naming

Of 16,515 names with enough joint search-and-birth history, 7,530 produce numerically valid Granger tests, and 19.1% of those show that search interest predicts SSA birth registrations (p < 0.05), with a median optimal lag of 1 year. Film and celebrity events produce positive impulse responses (+0.006 at the 1-year horizon), while news events produce negative responses — parents actively avoid names in the news.

Film characters boost naming; news events suppress it

Panel VAR impulse response functions show film-character names have 2-7× larger positive effects than baseline. News-event names show the opposite: a measurable avoidance effect. This 'aspirational vs cautionary' split is the empirical foundation for how Namesake distinguishes trending names from names merely in the news.

46,412 names analyzed across 1.95 million data points

The full threshold computation covers 46,412 names from 1880–2024, generating 1,950,660 per-name-year threshold rows. Every SSA name with at least a decade of history now has an empirical answer to the question 'is this name's trajectory explainable by fashion drift?'

5.8 million pre-internet data points for context

We matched SSA names against 220 years of English fiction via Google Books Ngrams (1800–2019). Names that appeared frequently in published fiction before the internet era were already part of the internal fashion process — this prevents us from falsely attributing centuries-old trends to modern media.

The later phases — Hawkes half-lives, Bass broadcast-vs-peer adoption, the synthetic-control divergences, and the variance decomposition — are summarized in the pipeline steps above and detailed in the per-phase reports. The honest read: cultural gravity is real at the population level, but the per-event proof is humbler than the headlines suggest — only 24 of 200 well-attributed events clear a strict causal bar, and we report the rest as synthetic-control-adjusted divergences, not proven effects.

The strongest non-drift signals

These 24 names (20+ years of SSA history) have the highest share of years beating our neutral-drift null at the 99th percentile. Traditional classics dominate — their decades-long persistence at top ranks isn't what random fashion drift predicts. Click any name to see its full profile.

Note: this metric catches sustained non-drift behavior, not one-off spikes. Names like Khaleesi, Elden, or Loki have brief spikes that are real but too short to beat the model on this measure — those surface through the separate cultural attribution pipeline.

Published reports

What Makes a Baby Name Go Viral?

A data analysis of 1,141 cultural events and their impact on baby naming across 145 years of SSA birth records

Mike WestApril 2026

Methodology

Data sources, analytical methods, and limitations for the Namesake research pipeline.

Mike WestApril 2026

Lieberson Null Model Results

Full-scale Phase 5 analysis: 1.95M threshold rows across 46,412 names, 145 years of SSA data. Neutral drift + phonetic fashion null models with Wright-Fisher calibration.

Mike WestApril 2026

Phonetic Spillover Analysis

Phase 6: phonetic neighborhood cross-correlation, clusters, Granger tests, and spillover magnitudes vs SSA panel.

Mike WestApril 2026

Why this matters for parents

This research isn't just academic curiosity — it directly shapes the recommendations you see on Namesake. Understanding the mechanics of name trends means we can tell you things no other baby name site can:

•Whether a name you love is riding a cultural wave that's likely to crest (or has already peaked).
•Which names are trending because of genuine, broad cultural shift vs. a single event that will fade.
•How a name's sound profile connects it to a neighborhood of similar names — and whether that neighborhood is growing or shrinking.
•The difference between a name that's gaining because everyone saw the same movie and one that's spreading organically through communities.

The Namesake Cultural Diffusion Study is an ongoing research effort. For questions or collaboration inquiries, reach us at hello@namesake.baby.