NHTSA FARS 1975–2023
US Fatal Crash Database — AI-Ready Structured Dataset
49 years of US fatal motor vehicle crash data from the NHTSA Fatality Analysis Reporting System. Every crash on a US public road resulting in a death within 30 days — cleaned, standardised, and structured for AI/ML workflows. Three relational tables covering ~990K crash events, ~1.5M vehicles, and ~2.8M persons.
~990K
Fatal Crashes
49
Years
52
States
3
Tables
Use Cases
Train classifiers to predict crash severity from road, vehicle, and environmental features across 49 years of data.
Cluster fatal crashes by lat/lon to identify high-risk corridors and intersections nationwide.
49 years of continuous data enables time-series modeling of fatality rates, drunk driving trends, and seatbelt adoption.
Detailed per-person and per-vehicle impairment indicators for public health and policy research.
Link crash outcomes to vehicle type, model year, and restraint use to evaluate safety equipment effectiveness.
National-scale fatal crash data for risk scoring, underwriting models, and telematics calibration.
Schema
Three relational tables joining on st_case + year. Shown below: fars_accidents (primary table). Vehicles and Persons tables included in paid tiers.
| Field | Type | Description |
|---|---|---|
| st_case | int | Unique case number (state + sequence) |
| year | int | Data year (1975–2023) |
| state | int | State FIPS code |
| state_name | string | State name |
| crash_date | string | Crash date (YYYY-MM-DD) |
| crash_time | string | Crash time (HH:MM) |
| day_of_week | string | Day name |
| fatals | int | Number of fatalities |
| drunk_drivers | int | Number of drunk drivers |
| ve_total | int | Total vehicles involved |
| latitude | float | Decimal degrees (post-1999) |
| longitude | float | Decimal degrees (post-1999) |
| weather | int | Weather condition code |
| lgt_cond | int | Light condition code |
| man_col | int | Manner of collision code |
Quick Start
import pandas as pd
# Load accidents table
accidents = pd.read_parquet("fars_accidents.parquet")
# Fatalities by year
print(accidents.groupby("year")["fatals"].sum())
# Load all three tables
vehicles = pd.read_parquet("fars_vehicles.parquet")
persons = pd.read_parquet("fars_persons.parquet")
# Join persons to accidents
merged = persons.merge(
accidents[["st_case", "year", "state_name", "crash_date"]],
on=["st_case", "year"]
)Pricing
Data Provenance
Source: National Highway Traffic Safety Administration (NHTSA), US Department of Transportation
Portal: NHTSA FARS Data Portal
License: FARS is a US federal government work and is in the public domain under 17 U.S.C. 105. The processed dataset inherits this public domain status. Paid tiers are licensed under the ClarityStorm Commercial Data License for internal use, covering our pipeline and enrichment work.
Attribution: “FARS data sourced from NHTSA Fatality Analysis Reporting System, processed by ClarityStorm Data.”
Need custom data cuts, API access, or bulk licensing?
Contact Sales