NHTSA FARS 1975–2023
US Fatal Crash Database — AI-Ready Structured Dataset
49 years of US fatal motor vehicle crash data from the NHTSA Fatality Analysis Reporting System. Every crash on a US public road resulting in a death within 30 days — cleaned, standardised, and structured for AI/ML workflows. Three relational tables covering ~990K crash events, ~1.5M vehicles, and ~2.8M persons.
Why not just download it from NHTSA?
You can — it's public domain. Here's what we saved you:
- ✓ Harmonized 49 years of schema changes — NHTSA's column names, types, and coding schemes shifted repeatedly across decades
- ✓ Standardized crash_date and crash_time fields across every format variation NHTSA has used since 1975
- ✓ Added human-readable fields —
state_name,day_of_weekalongside the original FIPS codes - ✓ Consistent join keys across all 3 tables (Accidents, Vehicles, Persons) using
st_case + year - ✓ Parquet output — NHTSA ships SAS format exports; we convert to columnar Parquet for fast queries
⏱ Skip ~4–6 hours of data archaeology. 49 annual releases, each in SAS format, each slightly different.
What you'd need to do yourself ↓
- Download 49 annual zip files from the NHTSA FARS portal (each contains multiple tables)
- Read SAS export format — requires SAS, R, or Python's
pyreadstatto parse - Reconcile field names that changed across NHTSA annual releases
- Handle coding scheme changes (e.g., state codes, restraint codes) that shifted over 5 decades
- Merge into a consistent schema without breaking the Accident–Vehicle–Person join keys
~990K
Fatal Crashes
49
Years
52
States
3
Tables
Use Cases
Train classifiers to predict crash severity from road, vehicle, and environmental features across 49 years of data.
Cluster fatal crashes by lat/lon to identify high-risk corridors and intersections nationwide.
49 years of continuous data enables time-series modeling of fatality rates, drunk driving trends, and seatbelt adoption.
Detailed per-person and per-vehicle impairment indicators for public health and policy research.
Link crash outcomes to vehicle type, model year, and restraint use to evaluate safety equipment effectiveness.
National-scale fatal crash data for risk scoring, underwriting models, and telematics calibration.
Schema
Three relational tables joining on st_case + year. Shown below: fars_accidents (primary table). Vehicles and Persons tables included in paid tiers.
| Field | Type | Description |
|---|---|---|
| st_case | int | Unique case number (state + sequence) |
| year | int | Data year (1975–2023) |
| state | int | State FIPS code |
| state_name | string | State name |
| crash_date | string | Crash date (YYYY-MM-DD) |
| crash_time | string | Crash time (HH:MM) |
| day_of_week | string | Day name |
| fatals | int | Number of fatalities |
| drunk_drivers | int | Number of drunk drivers |
| ve_total | int | Total vehicles involved |
| latitude | float | Decimal degrees (post-1999) |
| longitude | float | Decimal degrees (post-1999) |
| weather | int | Weather condition code |
| lgt_cond | int | Light condition code |
| man_col | int | Manner of collision code |
Quick Start
import pandas as pd
# Load accidents table
accidents = pd.read_parquet("fars_accidents.parquet")
# Fatalities by year
print(accidents.groupby("year")["fatals"].sum())
# Load all three tables
vehicles = pd.read_parquet("fars_vehicles.parquet")
persons = pd.read_parquet("fars_persons.parquet")
# Join persons to accidents
merged = persons.merge(
accidents[["st_case", "year", "state_name", "crash_date"]],
on=["st_case", "year"]
)Pricing
Data Provenance
Source: National Highway Traffic Safety Administration (NHTSA), US Department of Transportation
Portal: NHTSA FARS Data Portal
License: FARS is a US federal government work and is in the public domain under 17 U.S.C. 105. The processed dataset inherits this public domain status. Paid tiers are licensed under the ClarityStorm Commercial Data License for internal use, covering our pipeline and enrichment work.
Attribution: “FARS data sourced from NHTSA Fatality Analysis Reporting System, processed by ClarityStorm Data.”
Need custom data cuts, API access, or bulk licensing?
Contact Sales