← All Datasets

CDC WONDER Mortality + Social Determinants of Health

18 Years of County-Level Deaths Joined with SDOH — 55K+ Records

Every CDC WONDER grouped mortality record from 1999 to 2020 — county × year × age group × sex × race/ethnicity × ICD-10 cause of death — joined with AHRQ's Social Determinants of Health database. Poverty rates, unemployment, uninsurance, primary care access, and a computed deprivation index at the county level. Three tables: a mortality table, an SDOH indicator table, and a pre-joined table ready for ML. From cardiovascular disease and cancer to drug overdose and suicide — the complete epidemiological picture of county-level mortality with its social context.

Sample: Public DomainPaid: Commercial LicenseCSV + ParquetAnnual Updates1999–2016ICD-10 Cause ChaptersDeprivation IndexYPLL Computed3,100+ Counties55K+ Records

Why not download it directly from CDC WONDER?

CDC WONDER is a web query tool, not a bulk data API. Here's what we handled to produce this dataset:

  • Bulk mortality extraction — CDC WONDER caps query results and requires manual form submissions. We scripted all county × demographic × cause × year combinations and reassembled them into a single flat file.
  • Suppression flags preserved — NCHS suppresses cells with <10 deaths. We track is_suppressed as a boolean rather than dropping or imputing those rows, preserving the statistical signal.
  • ICD-10 cause chapters computed — raw WONDER exports use 113 selected-cause codes. We mapped all codes to readable chapter labels (Cardiovascular Diseases, Neoplasms, Drug Overdose / Poisoning, Suicide, etc.) and retained individual ICD-10 codes for fine-grained analysis.
  • YPLL computed — Years of Potential Life Lost before age 75 calculated per row using age-group midpoints and death counts, a standard epidemiological measure of premature mortality burden.
  • AHRQ SDOH joined — five annual vintages (2016–2020) of the AHRQ Social Determinants of Health database merged at the county level with nearest-prior-year matching for 1999–2015 records.
  • Deprivation index computed — composite Z-score from poverty rate, unemployment rate, and uninsurance rate. Classified into quintiles (1 = least deprived, 5 = most deprived) for immediate use as a categorical feature.
  • USDA rural/urban codes merged — RUCC 1–9 classification (Metropolitan / Micropolitan / Rural) joined at county level for rural health stratification.
  • Race/ethnicity normalised — CDC's race codes collapsed to a consistent 5-group classification across all years (Non-Hispanic White, Black / African American, Hispanic / Latino, American Indian / Alaska Native, Asian / Pacific Islander).
  • Pre-joined ML table — mortality and SDOH combined in a single DataFrame so you can start modeling immediately without managing a multi-table join.
  • Parquet output — columnar storage for fast filtering across 55K+ rows without loading the full CSV.

⏱ Skip the WONDER query assembly, FIPS normalisation, SDOH merging, and deprivation index computation. Ready for analysis in minutes.

55K+

Mortality Records

3,100+

Counties

1999–2016

Years Covered

14

SDOH Indicators

Use Cases

Epidemiological Research & Health Disparities
Quantify mortality rate disparities by race/ethnicity, age group, and county across 18 years. Examine how cardiovascular disease, cancer, and drug overdose mortality rates diverge by deprivation quintile. Track the opioid epidemic's county-level progression through cause-specific YPLL trends from 1999 to 2020.
Social Determinants of Health Modeling
The pre-joined mortality × SDOH table lets you train models that directly predict mortality rates from poverty, unemployment, uninsurance, and healthcare access — all in a single DataFrame. No manual joins across disparate government data sources. Test whether primary care shortage counties show higher preventable mortality even after controlling for age and race.
Public Health Program Evaluation
Identify counties with persistently high age-adjusted cardiovascular mortality rates despite average or below-average deprivation scores — potential high-value intervention targets. Evaluate natural experiments: counties where major hospital closures correlate with mortality changes across ICD-10 chapters. Benchmark states on avoidable premature mortality (YPLL) by cause.
Insurance & Actuarial Risk Modeling
Build county-level mortality risk models incorporating 20 years of cause-specific rate trends and SDOH covariates. Estimate excess mortality exposure for specific age-sex-race-county cohorts. Support life insurance product pricing, long-term care underwriting, and Medicare Advantage geographic risk adjustment.
AI/ML Health Prediction Applications
Train regression models to predict county-level crude or age-adjusted mortality rates from SDOH features. Classify counties into deprivation × mortality quadrants to identify 'high-need, high-mortality' vs 'high-need, low-mortality' outliers. Build time-series forecasts of cause-specific mortality by county using 18 years of ground truth.
Policy & Advocacy — Rural Health Equity
Rural counties (RUCC 7–9) have systematically higher uninsurance rates, fewer primary care physicians, and higher premature mortality. This dataset puts all those dimensions in one file. Quantify the rural-urban mortality gap by cause and state. Support Medicaid expansion impact analyses by comparing states pre- and post-expansion on uninsurance and cause-specific mortality trends.

Schema — Mortality Table

cdc_wonder_mortality.csv / .parquet — one row per county × year × age × sex × race × cause combination

FieldTypeDescription
record_idintUnique row identifier
county_fipsstring5-digit county FIPS code (e.g. 06037 = Los Angeles)
state_fipsstring2-digit state FIPS code
county_namestringCounty name
state_namestringState name
yearintYear (1999–2016)
age_group_labelstringAge group: <1, 1-4, 5-14, 15-24, 25-34, 35-44, 45-54, 55-64, 65-74, 75-84, 85+
age_group_numeric_startintNumeric start of age group for sorting
sexstringMale / Female
race_ethnicitystringRace/ethnicity: Non-Hispanic White, Black / African American, Hispanic / Latino, American Indian / Alaska Native, Asian / Pacific Islander
cause_icd10stringICD-10 underlying cause of death code
cause_labelstringHuman-readable cause of death label
cause_chapterstringICD-10 chapter grouping (computed): Cardiovascular Diseases, Neoplasms, Respiratory Diseases, etc.
deathsfloatNumber of deaths (null where suppressed — <10 deaths per NCHS policy)
is_suppressedboolTrue where death count suppressed for privacy (<10 deaths in cell)
populationfloatCounty population denominator for the year/demographic group
crude_ratefloatCrude mortality rate per 100,000 population
crude_rate_95ci_lowerfloatLower bound of 95% confidence interval for crude rate
crude_rate_95ci_upperfloatUpper bound of 95% confidence interval for crude rate
mortality_tierstringVery Low / Low / Moderate / High / Very High — quintile classification within cause × year (computed)
years_potential_life_lostfloatYPLL before age 75: deaths × (75 − midpoint age). Quantifies premature mortality burden (computed)

Schema — SDOH Table

cdc_wonder_sdoh.csv / .parquet — one row per county × year (AHRQ SDOH vintages 2016–2020)

FieldTypeDescription
sdoh_record_idintUnique SDOH row identifier
county_fipsstring5-digit county FIPS code (join key to mortality table)
state_fipsstring2-digit state FIPS code
state_namestringState name
yearintSDOH data year (2016–2020, from AHRQ)
poverty_pctfloat% of county population below 185% of the federal poverty level (ACS)
median_household_incomefloatMedian household income in dollars (ACS)
unemployment_ratefloatAnnual county unemployment rate — % (BLS)
uninsured_pctfloat% of county population without health insurance (ACS)
primary_care_per_100kfloatPrimary care physicians per 100,000 population (AHRF)
college_pctfloat% of adults with a bachelor's degree or higher (ACS)
housing_cost_burdened_pctfloat% of households spending 30%+ of income on housing (ACS)
food_environment_indexfloatUSDA food environment index (0–10; higher = better food access)
air_quality_index_avgfloatAverage annual Air Quality Index for the county (EPA AQS)
rural_urban_codestringUSDA Rural-Urban Continuum Code 1–9 (1 = large metro, 9 = completely rural)
rural_urban_labelstringMetropolitan / Micropolitan / Rural label (computed from RUCC)
deprivation_indexfloatComposite deprivation Z-score: mean of poverty, unemployment, uninsured Z-scores (computed)
deprivation_quintileint1 (least deprived) to 5 (most deprived) quintile classification (computed)

The pre-joined table (cdc_wonder_mortality_sdoh_joined.csv / .parquet) combines all mortality columns with SDOH indicators on county_fips, using nearest-prior-year SDOH matching. Use this table to start ML workflows without managing the join yourself.

Quick Start

import pandas as pd

# Load mortality table
mort = pd.read_parquet("cdc_wonder_mortality.parquet")

# Top-10 counties by cardiovascular YPLL (non-suppressed rows only)
cardio = mort[
    (mort.cause_chapter == "Cardiovascular Diseases") &
    (~mort.is_suppressed)
]
top_ypll = (
    cardio.groupby(["county_name", "state_name"])["years_potential_life_lost"]
    .sum()
    .sort_values(ascending=False)
    .head(10)
)
print(top_ypll)

# Load pre-joined table for ML
joined = pd.read_parquet("cdc_wonder_mortality_sdoh_joined.parquet")

# Predict crude mortality rate from deprivation quintile + rural/urban
features = joined[joined.cause_chapter == "Drug Overdose / Poisoning"][[
    "deprivation_quintile", "rural_urban_code",
    "poverty_pct", "uninsured_pct", "crude_rate"
]].dropna()

# Deprivation quintile 5 (most deprived) drug overdose rate vs quintile 1
q5 = features[features.deprivation_quintile == 5]["crude_rate"].mean()
q1 = features[features.deprivation_quintile == 1]["crude_rate"].mean()
print(f"Drug overdose rate ratio Q5/Q1: {q5/q1:.2f}x")

Pairs Well With

Join county drug overdose mortality with FAERS adverse event reports by drug class. Validate county-level overdose trends against reported adverse event volume.
Correlate financial distress (complaint rates by product) with county deprivation index and cardiovascular or mental health mortality rates.
Layer occupational injury rates with county-level SDOH and mortality to build a holistic county health risk profile for ESG and insurance applications.

Pricing

Sample

Free

1,000 rows (CSV) + full schema documentation

License: Public Domain

Download Sample
Complete

$129

55K+ county x year mortality records, 1999-2016, CSV + Parquet. All 50 states + DC, 3,100+ counties.

License: Commercial License

Buy Complete Dataset
Annual

$249/yr

Complete dataset + annual updates as CDC refreshes mortality data

License: Commercial License

Subscribe Annual

Data Provenance

  • Mortality: CDC WONDER Compressed Mortality File, Underlying Cause of Death 1999–2016 (ICD-10). Source: National Center for Health Statistics (NCHS), CDC. Public domain — U.S. government work.
  • SDOH indicators: AHRQ Social Determinants of Health Database, vintages 2016–2020. Includes ACS (Census Bureau), BLS, AHRF (HRSA), USDA, HUD, EPA sources. All public domain.
  • Rural/urban classification: USDA Economic Research Service Rural-Urban Continuum Codes 2013. Public domain.
  • ClarityStorm processing: ICD-10 chapter mapping, YPLL computation, deprivation index, SDOH join, suppression flag preservation, and Parquet conversion. Pipeline source code included in purchase.
  • License: Free sample is public domain. ClarityStorm Commercial Data License covers paid tiers — internal use, no redistribution or resale of raw data.