CDC WONDER Mortality + Social Determinants of Health
18 Years of County-Level Deaths Joined with SDOH — 55K+ Records
Every CDC WONDER grouped mortality record from 1999 to 2020 — county × year × age group × sex × race/ethnicity × ICD-10 cause of death — joined with AHRQ's Social Determinants of Health database. Poverty rates, unemployment, uninsurance, primary care access, and a computed deprivation index at the county level. Three tables: a mortality table, an SDOH indicator table, and a pre-joined table ready for ML. From cardiovascular disease and cancer to drug overdose and suicide — the complete epidemiological picture of county-level mortality with its social context.
Why not download it directly from CDC WONDER?
CDC WONDER is a web query tool, not a bulk data API. Here's what we handled to produce this dataset:
- ✓ Bulk mortality extraction — CDC WONDER caps query results and requires manual form submissions. We scripted all county × demographic × cause × year combinations and reassembled them into a single flat file.
- ✓ Suppression flags preserved — NCHS suppresses cells with <10 deaths. We track
is_suppressedas a boolean rather than dropping or imputing those rows, preserving the statistical signal. - ✓ ICD-10 cause chapters computed — raw WONDER exports use 113 selected-cause codes. We mapped all codes to readable chapter labels (Cardiovascular Diseases, Neoplasms, Drug Overdose / Poisoning, Suicide, etc.) and retained individual ICD-10 codes for fine-grained analysis.
- ✓ YPLL computed — Years of Potential Life Lost before age 75 calculated per row using age-group midpoints and death counts, a standard epidemiological measure of premature mortality burden.
- ✓ AHRQ SDOH joined — five annual vintages (2016–2020) of the AHRQ Social Determinants of Health database merged at the county level with nearest-prior-year matching for 1999–2015 records.
- ✓ Deprivation index computed — composite Z-score from poverty rate, unemployment rate, and uninsurance rate. Classified into quintiles (1 = least deprived, 5 = most deprived) for immediate use as a categorical feature.
- ✓ USDA rural/urban codes merged — RUCC 1–9 classification (Metropolitan / Micropolitan / Rural) joined at county level for rural health stratification.
- ✓ Race/ethnicity normalised — CDC's race codes collapsed to a consistent 5-group classification across all years (Non-Hispanic White, Black / African American, Hispanic / Latino, American Indian / Alaska Native, Asian / Pacific Islander).
- ✓ Pre-joined ML table — mortality and SDOH combined in a single DataFrame so you can start modeling immediately without managing a multi-table join.
- ✓ Parquet output — columnar storage for fast filtering across 55K+ rows without loading the full CSV.
⏱ Skip the WONDER query assembly, FIPS normalisation, SDOH merging, and deprivation index computation. Ready for analysis in minutes.
55K+
Mortality Records
3,100+
Counties
1999–2016
Years Covered
14
SDOH Indicators
Use Cases
Schema — Mortality Table
cdc_wonder_mortality.csv / .parquet — one row per county × year × age × sex × race × cause combination
| Field | Type | Description |
|---|---|---|
| record_id | int | Unique row identifier |
| county_fips | string | 5-digit county FIPS code (e.g. 06037 = Los Angeles) |
| state_fips | string | 2-digit state FIPS code |
| county_name | string | County name |
| state_name | string | State name |
| year | int | Year (1999–2016) |
| age_group_label | string | Age group: <1, 1-4, 5-14, 15-24, 25-34, 35-44, 45-54, 55-64, 65-74, 75-84, 85+ |
| age_group_numeric_start | int | Numeric start of age group for sorting |
| sex | string | Male / Female |
| race_ethnicity | string | Race/ethnicity: Non-Hispanic White, Black / African American, Hispanic / Latino, American Indian / Alaska Native, Asian / Pacific Islander |
| cause_icd10 | string | ICD-10 underlying cause of death code |
| cause_label | string | Human-readable cause of death label |
| cause_chapter | string | ICD-10 chapter grouping (computed): Cardiovascular Diseases, Neoplasms, Respiratory Diseases, etc. |
| deaths | float | Number of deaths (null where suppressed — <10 deaths per NCHS policy) |
| is_suppressed | bool | True where death count suppressed for privacy (<10 deaths in cell) |
| population | float | County population denominator for the year/demographic group |
| crude_rate | float | Crude mortality rate per 100,000 population |
| crude_rate_95ci_lower | float | Lower bound of 95% confidence interval for crude rate |
| crude_rate_95ci_upper | float | Upper bound of 95% confidence interval for crude rate |
| mortality_tier | string | Very Low / Low / Moderate / High / Very High — quintile classification within cause × year (computed) |
| years_potential_life_lost | float | YPLL before age 75: deaths × (75 − midpoint age). Quantifies premature mortality burden (computed) |
Schema — SDOH Table
cdc_wonder_sdoh.csv / .parquet — one row per county × year (AHRQ SDOH vintages 2016–2020)
| Field | Type | Description |
|---|---|---|
| sdoh_record_id | int | Unique SDOH row identifier |
| county_fips | string | 5-digit county FIPS code (join key to mortality table) |
| state_fips | string | 2-digit state FIPS code |
| state_name | string | State name |
| year | int | SDOH data year (2016–2020, from AHRQ) |
| poverty_pct | float | % of county population below 185% of the federal poverty level (ACS) |
| median_household_income | float | Median household income in dollars (ACS) |
| unemployment_rate | float | Annual county unemployment rate — % (BLS) |
| uninsured_pct | float | % of county population without health insurance (ACS) |
| primary_care_per_100k | float | Primary care physicians per 100,000 population (AHRF) |
| college_pct | float | % of adults with a bachelor's degree or higher (ACS) |
| housing_cost_burdened_pct | float | % of households spending 30%+ of income on housing (ACS) |
| food_environment_index | float | USDA food environment index (0–10; higher = better food access) |
| air_quality_index_avg | float | Average annual Air Quality Index for the county (EPA AQS) |
| rural_urban_code | string | USDA Rural-Urban Continuum Code 1–9 (1 = large metro, 9 = completely rural) |
| rural_urban_label | string | Metropolitan / Micropolitan / Rural label (computed from RUCC) |
| deprivation_index | float | Composite deprivation Z-score: mean of poverty, unemployment, uninsured Z-scores (computed) |
| deprivation_quintile | int | 1 (least deprived) to 5 (most deprived) quintile classification (computed) |
The pre-joined table (cdc_wonder_mortality_sdoh_joined.csv / .parquet) combines all mortality columns with SDOH indicators on county_fips, using nearest-prior-year SDOH matching. Use this table to start ML workflows without managing the join yourself.
Quick Start
import pandas as pd
# Load mortality table
mort = pd.read_parquet("cdc_wonder_mortality.parquet")
# Top-10 counties by cardiovascular YPLL (non-suppressed rows only)
cardio = mort[
(mort.cause_chapter == "Cardiovascular Diseases") &
(~mort.is_suppressed)
]
top_ypll = (
cardio.groupby(["county_name", "state_name"])["years_potential_life_lost"]
.sum()
.sort_values(ascending=False)
.head(10)
)
print(top_ypll)
# Load pre-joined table for ML
joined = pd.read_parquet("cdc_wonder_mortality_sdoh_joined.parquet")
# Predict crude mortality rate from deprivation quintile + rural/urban
features = joined[joined.cause_chapter == "Drug Overdose / Poisoning"][[
"deprivation_quintile", "rural_urban_code",
"poverty_pct", "uninsured_pct", "crude_rate"
]].dropna()
# Deprivation quintile 5 (most deprived) drug overdose rate vs quintile 1
q5 = features[features.deprivation_quintile == 5]["crude_rate"].mean()
q1 = features[features.deprivation_quintile == 1]["crude_rate"].mean()
print(f"Drug overdose rate ratio Q5/Q1: {q5/q1:.2f}x")Pairs Well With
Pricing
Free
$129
55K+ county x year mortality records, 1999-2016, CSV + Parquet. All 50 states + DC, 3,100+ counties.
License: Commercial License
Buy Complete Dataset$249/yr
Complete dataset + annual updates as CDC refreshes mortality data
License: Commercial License
Subscribe AnnualData Provenance
- Mortality: CDC WONDER Compressed Mortality File, Underlying Cause of Death 1999–2016 (ICD-10). Source: National Center for Health Statistics (NCHS), CDC. Public domain — U.S. government work.
- SDOH indicators: AHRQ Social Determinants of Health Database, vintages 2016–2020. Includes ACS (Census Bureau), BLS, AHRF (HRSA), USDA, HUD, EPA sources. All public domain.
- Rural/urban classification: USDA Economic Research Service Rural-Urban Continuum Codes 2013. Public domain.
- ClarityStorm processing: ICD-10 chapter mapping, YPLL computation, deprivation index, SDOH join, suppression flag preservation, and Parquet conversion. Pipeline source code included in purchase.
- License: Free sample is public domain. ClarityStorm Commercial Data License covers paid tiers — internal use, no redistribution or resale of raw data.