← All Datasets

NHTSA FARS 1975–2023

US Fatal Crash Database — AI-Ready Structured Dataset

49 years of US fatal motor vehicle crash data from the NHTSA Fatality Analysis Reporting System. Every crash on a US public road resulting in a death within 30 days — cleaned, standardised, and structured for AI/ML workflows. Three relational tables covering ~990K crash events, ~1.5M vehicles, and ~2.8M persons.

Sample: Public DomainPaid: Commercial LicenseCSV + Parquet~5.3M records1975–2023

~990K

Fatal Crashes

49

Years

52

States

3

Tables

Use Cases

Fatality Prediction Models

Train classifiers to predict crash severity from road, vehicle, and environmental features across 49 years of data.

Geospatial Hotspot Analysis

Cluster fatal crashes by lat/lon to identify high-risk corridors and intersections nationwide.

Long-Term Safety Trend Analysis

49 years of continuous data enables time-series modeling of fatality rates, drunk driving trends, and seatbelt adoption.

Drug & Alcohol Impairment Research

Detailed per-person and per-vehicle impairment indicators for public health and policy research.

Vehicle Safety Research

Link crash outcomes to vehicle type, model year, and restraint use to evaluate safety equipment effectiveness.

Insurance Actuarial Modeling

National-scale fatal crash data for risk scoring, underwriting models, and telematics calibration.

Schema

Three relational tables joining on st_case + year. Shown below: fars_accidents (primary table). Vehicles and Persons tables included in paid tiers.

FieldTypeDescription
st_caseintUnique case number (state + sequence)
yearintData year (1975–2023)
stateintState FIPS code
state_namestringState name
crash_datestringCrash date (YYYY-MM-DD)
crash_timestringCrash time (HH:MM)
day_of_weekstringDay name
fatalsintNumber of fatalities
drunk_driversintNumber of drunk drivers
ve_totalintTotal vehicles involved
latitudefloatDecimal degrees (post-1999)
longitudefloatDecimal degrees (post-1999)
weatherintWeather condition code
lgt_condintLight condition code
man_colintManner of collision code

Quick Start

import pandas as pd

# Load accidents table
accidents = pd.read_parquet("fars_accidents.parquet")

# Fatalities by year
print(accidents.groupby("year")["fatals"].sum())

# Load all three tables
vehicles = pd.read_parquet("fars_vehicles.parquet")
persons  = pd.read_parquet("fars_persons.parquet")

# Join persons to accidents
merged = persons.merge(
    accidents[["st_case", "year", "state_name", "crash_date"]],
    on=["st_case", "year"]
)

Pricing

Sample

Free

1,000 rows (CSV) + schema docs

Public Domain

Download Sample
Complete

$99

All 3 tables (Accidents, Vehicles, Persons) — CSV + Parquet

Commercial License

Buy Complete
Annual

$249/yr

All files + annual updates when NHTSA releases new FARS data

Commercial License

Subscribe

Data Provenance

Source: National Highway Traffic Safety Administration (NHTSA), US Department of Transportation

Portal: NHTSA FARS Data Portal

License: FARS is a US federal government work and is in the public domain under 17 U.S.C. 105. The processed dataset inherits this public domain status. Paid tiers are licensed under the ClarityStorm Commercial Data License for internal use, covering our pipeline and enrichment work.

Attribution: “FARS data sourced from NHTSA Fatality Analysis Reporting System, processed by ClarityStorm Data.”

Need custom data cuts, API access, or bulk licensing?

Contact Sales