---
title: "Replicating *The Bell Curve*"
subtitle: "AFQT, SES, and Social Outcomes in NLSY79/97 and GSS"
author: "Troy Altus"
date: "May 2026"
jupyter: bell-curve-replication
format:
html:
theme: cosmo
toc: true
toc-depth: 3
number-sections: true
fig-align: center
self-contained: true
code-fold: true
code-tools: true
pdf:
documentclass: article
number-sections: true
colorlinks: true
toc: true
pdf-engine: latexmk
pdf-engine-opts:
- -lualatex
bibliography: references.bib
execute:
warning: false
message: false
---
## Abstract
Herrnstein and Murray's 1994 *The Bell Curve* argued that cognitive ability,
measured by AFQT scores from the National Longitudinal Survey of Youth 1979
(NLSY79), predicts a wide range of social outcomes — poverty, unemployment,
illegitimacy, welfare dependency, crime — at least as strongly as
socioeconomic background. This paper replicates their core analyses using the
original NLSY79 cohort and extends the replication to NLSY97 and the General
Social Survey (GSS, 1972–2022). All analyses follow H&M's logistic regression
specification: binary outcome regressed on AFQT z-score and a composite SES
index, with age and sex as covariates. Where H&M's findings replicate, we note
it; where they do not, we report the discrepancy and examine possible causes.
---
## Introduction
Few social science books have provoked more sustained controversy than
*The Bell Curve* [@herrnstein1994]. Its central empirical claim — that cognitive
ability measured in early adulthood predicts life outcomes more robustly than
family socioeconomic status — rested on a technically careful but highly
contested analysis of NLSY79 data. Replication is straightforward in principle:
the data are public, the method is described in the text and appendix, and the
statistical tools are standard. In practice, several complications arise:
variable coding has changed across NLSY data releases, the original extract is
not publicly archived, and the political sensitivity of the topic has discouraged
neutral replication attempts.
This paper proceeds without political agenda in either direction. The goal is to
determine whether the key quantitative findings hold up under modern methods and
newer data.
### Research questions
1. Does AFQT predict poverty, unemployment, educational attainment, and crime
after controlling for parental SES?
2. Has the relative predictive power of cognitive ability vs. SES changed between
the NLSY79 cohort (born 1957–64) and NLSY97 (born 1980–84)?
3. Do results from WORDSUM (GSS vocabulary test) as an IQ proxy replicate the
directional findings from AFQT?
---
## Data
### NLSY79
The Bureau of Labor Statistics National Longitudinal Survey of Youth, 1979
cohort, interviewed 12,686 respondents (born 1957–1964) annually from 1979
through 1994, then biennially through 2018. The AFQT (Armed Forces Qualification
Test) was administered in 1980 to nearly the full sample.
H&M's AFQT variable is the percentile score on the Armed Forces Qualification
Test, sections 2 (arithmetic reasoning), 3 (word knowledge), 4 (paragraph
comprehension), and half of section 5 (numerical operations). The BLS recomputed
and revised the AFQT score in 1989; H&M used the revised score.
**Data access:** Custom extract via NLS Investigator (https://www.nlsinfo.org/investigator).
Targeted variables listed in `src/acquire.py`. Targeted extract file size: **~5–15 MB CSV**.
### NLSY97
The NLSY97 cohort (born 1980–1984, n $\approx$ 8,984) was interviewed annually from
1997 through 2011, then biennially. AFQT was administered in 1997–98.
Used here as a robustness check: same design, 20-year-later cohort.
**File size:** ~10–25 MB CSV (targeted extract).
### General Social Survey (GSS)
The GSS (NORC, University of Chicago) has fielded a nationally representative
cross-sectional survey of U.S. adults nearly every year since 1972, with
cumulative n $\approx$ 72,000 through 2022. The WORDSUM variable — a 10-item vocabulary
test — correlates ~0.7 with standard IQ measures and is the cognitive proxy
H&M themselves validate against AFQT.
**File size:** Full cumulative Stata file download: **~200 MB** (zip). After
extracting to targeted variables: ~15–30 MB.
```{python}
#| label: setup
import sys
sys.path.insert(0, ".")
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
from pathlib import Path
DATA_READY = {
"nlsy79": any(Path("data/raw/nlsy79").glob("*.csv")),
"nlsy97": any(Path("data/raw/nlsy97").glob("*.csv")),
"gss": any(Path("data/raw/gss").glob("*.dta")) or
any(Path("data/raw/gss").glob("*.csv")),
}
print("Data availability:")
for k, v in DATA_READY.items():
print(f" {k}: {'✓' if v else '✗ NOT FOUND — see src/acquire.py'}")
```
```{python}
#| label: load-data
from src.nlsy import load_nlsy79, load_nlsy97, cache as nlsy_cache
from src.gss import load_gss, cache as gss_cache
dfs = {}
if DATA_READY["nlsy79"]:
dfs["nlsy79"] = load_nlsy79()
nlsy_cache(dfs["nlsy79"], "nlsy79")
print(f"NLSY79: {len(dfs['nlsy79']):,} respondents")
if DATA_READY["nlsy97"]:
dfs["nlsy97"] = load_nlsy97()
nlsy_cache(dfs["nlsy97"], "nlsy97")
print(f"NLSY97: {len(dfs['nlsy97']):,} respondents")
if DATA_READY["gss"]:
dfs["gss"] = load_gss()
gss_cache(dfs["gss"])
print(f"GSS: {len(dfs['gss']):,} respondent-years")
```
---
## AFQT and SES Distribution
Before replicating H&M's outcome models, we verify that the cognitive and SES
measures are distributed as expected.
```{python}
#| label: fig-afqt-dist
#| fig-cap: "AFQT z-score distribution by source"
import plotly.express as px
import plotly.graph_objects as go
figs = []
for name, df in dfs.items():
col = "afqt_z" if "afqt_z" in df.columns else "wordsum_z"
if col not in df.columns:
continue
sub = df[col].dropna()
figs.append(go.Histogram(x=sub, name=name, opacity=0.6, nbinsx=50))
if figs:
fig = go.Figure(figs)
fig.update_layout(
barmode="overlay",
title="Cognitive Measure Distribution",
xaxis_title="Z-score",
yaxis_title="Count",
template="plotly_white",
)
fig.show()
```
---
## Poverty {#sec-poverty}
*The Bell Curve*, Chapter 5: Among whites ages 30–29 in the NLSY, the
probability of being in poverty as a function of AFQT score, controlling for
SES, shows a strong negative gradient. H&M report that moving from the 5th to
the 95th AFQT percentile reduces poverty probability from ~26% to ~6%, holding
SES at the median.
```{python}
#| label: tbl-poverty-quintiles
#| tbl-cap: "Poverty rate by cognitive quintile"
from src.models import quintile_table, logit_replicate
from src.plots import quintile_bar, prediction_curve
if "nlsy79" in dfs and "poor" in dfs["nlsy79"].columns:
tbl = quintile_table(dfs["nlsy79"], "poor")
display(tbl[["quintile", "rate_pct", "n"]].rename(
columns={"rate_pct": "Poverty Rate (%)", "n": "N"}))
fig = quintile_bar(tbl, "Poverty Rate by Cognitive Quintile (NLSY79)")
fig.show()
```
```{python}
#| label: fig-poverty-curve
#| fig-cap: "Predicted poverty probability vs. AFQT, SES held at median"
if "nlsy79" in dfs and "poor" in dfs["nlsy79"].columns:
result = logit_replicate(dfs["nlsy79"], "poor")
print(result.summary_df.to_string())
fig = prediction_curve(result, "Poverty Probability vs. AFQT (NLSY79)")
fig.show()
```
---
## Educational Attainment {#sec-education}
*The Bell Curve*, Chapter 6: H&M examine high school dropout as a binary
outcome, arguing cognitive ability is a stronger predictor than SES.
```{python}
#| label: sec-educ-setup
if "nlsy79" in dfs:
df79 = dfs["nlsy79"]
if "educ_2018" in df79.columns:
df79["dropout"] = (df79["educ_2018"] < 12).astype(float)
df79.loc[df79["educ_2018"].isna(), "dropout"] = np.nan
```
```{python}
#| label: fig-dropout-curve
if "nlsy79" in dfs and "dropout" in dfs["nlsy79"].columns:
result = logit_replicate(dfs["nlsy79"], "dropout")
print(result.summary_df.to_string())
fig = prediction_curve(result, "High School Dropout Probability vs. AFQT (NLSY79)")
fig.show()
```
---
## Unemployment {#sec-unemployment}
```{python}
#| label: fig-unemployment
for name in ["nlsy79", "gss"]:
if name not in dfs:
continue
df = dfs[name]
cog = "afqt_z" if "afqt_z" in df.columns else "wordsum_z"
if "unemployed" not in df.columns or cog not in df.columns:
continue
result = logit_replicate(df, "unemployed", cognitive_col=cog)
print(f"\n=== {name.upper()} ===")
print(result.summary_df.to_string())
fig = prediction_curve(result, f"Unemployment Probability vs. Cognitive Score ({name.upper()})")
fig.show()
```
---
## Crime and Incarceration {#sec-crime}
*The Bell Curve*, Chapter 11: H&M report that AFQT strongly predicts
self-reported incarceration history, with low-IQ males ~5× more likely to have
been incarcerated than high-IQ males, holding SES constant.
```{python}
#| label: fig-crime
for name in ["nlsy79", "gss"]:
if name not in dfs:
continue
df = dfs[name]
cog = "afqt_z" if "afqt_z" in df.columns else "wordsum_z"
out = "ever_arrested" if "ever_arrested" in df.columns else None
if not out or cog not in df.columns:
continue
result = logit_replicate(df, out, cognitive_col=cog)
print(f"\n=== {name.upper()} ===")
print(result.summary_df.to_string())
fig = prediction_curve(result, f"Arrest Probability vs. Cognitive Score ({name.upper()})")
fig.show()
```
---
## GSS Cross-Sectional Extension (1972–2022) {#sec-gss}
The GSS allows a rough replication across five decades. We use WORDSUM as the
cognitive proxy, replicate the poverty and unemployment models, and examine
whether the AFQT/SES relative effect sizes have changed over time by estimating
models on decade subsets.
```{python}
#| label: gss-decade-analysis
if "gss" in dfs:
gss = dfs["gss"]
if "year" in gss.columns and "wordsum_z" in gss.columns:
gss["decade"] = (gss["year"] // 10) * 10
results = []
for decade, sub in gss.groupby("decade"):
for outcome in ["poor", "unemployed", "ever_arrested"]:
if outcome not in sub.columns:
continue
try:
r = logit_replicate(sub, outcome, cognitive_col="wordsum_z")
coef = r.summary_df.loc["wordsum_z", "OR"] if "wordsum_z" in r.summary_df.index else np.nan
results.append({"decade": decade, "outcome": outcome, "OR_wordsum": coef, "n": r.n})
except Exception:
pass
if results:
res_df = pd.DataFrame(results)
import plotly.express as px
fig = px.line(
res_df, x="decade", y="OR_wordsum", color="outcome",
title="WORDSUM Odds Ratio by Decade (GSS)",
labels={"OR_wordsum": "Odds Ratio (cognitive)", "decade": "Decade"},
template="plotly_white",
)
fig.add_hline(y=1, line_dash="dash", line_color="gray")
fig.show()
```
---
## Discussion
*[To be completed after data loaded and results reviewed.]*
Key questions to address:
1. Do NLSY79 results replicate H&M's specific probability estimates within
reasonable tolerance (±5 percentage points)?
2. Is the AFQT coefficient larger in magnitude than the SES coefficient across
all outcomes, as H&M claim?
3. Do NLSY97 and GSS results show attenuation or amplification of the
cognitive-outcome gradient relative to NLSY79?
4. What proportion of the racial gap in outcomes is explained by AFQT vs. SES?
---
## Conclusion
*[To be completed after analysis.]*
---
## References
::: {#refs}
:::