How to Analyze US Traffic Fatality Trends with FARS Data

The NHTSA Fatality Analysis Reporting System (FARS) is one of the richest public safety datasets in existence: every fatal motor vehicle crash on a US public road since 1975, with details on the crash, each vehicle involved, and every person in those vehicles. That's 49 years and roughly 5.3 million records across three linked tables. If you work in road safety research, insurance, automotive engineering, or public policy — this dataset deserves your attention.

In this tutorial we'll load the cleaned ClarityStorm FARS dataset, compute fatality trends over time, profile drunk-driving crashes, and plot a geospatial heatmap of the deadliest corridors in the continental US. All in about 50 lines of Python.

What's in the Dataset

The ClarityStorm FARS release ships three tables as both CSV and Parquet: accidents (one row per crash), vehicles (one row per vehicle in each crash), and persons (one row per occupant or pedestrian). The accidents table is the backbone — it contains crash date, time, location (lat/lon for post-1999 records), weather, light conditions, manner of collision, total fatalities, and drunk-driver count.

~860K crash events (accidents table)
~1.6M vehicle records (vehicles table)
~2.8M person records (persons table)
Lat/lon coordinates from 2000 onward for geospatial analysis
Consistent schema spanning 1975–2023 — ideal for long-horizon time series

Loading the Data

The Parquet files load roughly 10× faster than CSVs and consume less memory. If you're working with the full dataset, prefer Parquet. The free sample (1,000 rows CSV) is enough to validate your pipeline before purchasing.

python

import pandas as pd

# Load accidents table (Parquet preferred for full dataset)
accidents = pd.read_parquet("fars_accidents.parquet")

print(f"Rows: {len(accidents):,}")
print(accidents.dtypes)
print(accidents[["year","state_name","fatals","drunk_drivers"]].head(10))

Fatality Trends Over Time

One of the most compelling stories in FARS data is the long-term decline in US traffic fatalities despite rising vehicle miles traveled. From a peak of over 54,000 deaths in 1972 (pre-FARS, estimated) to around 40,000 per year today, decades of safety regulation, airbag mandates, and drunk-driving laws show clear effects in the data.

python

import matplotlib.pyplot as plt

# Annual fatalities
annual = (
    accidents
    .groupby("year")["fatals"]
    .sum()
    .reset_index()
    .rename(columns={"fatals": "total_fatalities"})
)

plt.figure(figsize=(12, 5))
plt.plot(annual["year"], annual["total_fatalities"], color="#0ea5e9", linewidth=2)
plt.fill_between(annual["year"], annual["total_fatalities"], alpha=0.15, color="#0ea5e9")
plt.title("US Traffic Fatalities 1975–2023 (FARS)", fontsize=14, fontweight="bold")
plt.xlabel("Year")
plt.ylabel("Total Fatalities")
plt.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.savefig("fatalities_trend.png", dpi=150)

Drunk-Driving Analysis

FARS records the number of drunk drivers involved in each fatal crash. Combining this with crash year gives a clear picture of how anti-DUI legislation (national 0.08 BAC standard in 2000, ignition interlock laws, rideshare adoption post-2012) has reduced alcohol-impaired fatality rates.

python

# DUI involvement rate by year
dui_trend = accidents.groupby("year").agg(
    total_crashes=("st_case", "count"),
    dui_crashes=("drunk_drivers", lambda x: (x > 0).sum()),
).reset_index()

dui_trend["dui_rate"] = dui_trend["dui_crashes"] / dui_trend["total_crashes"]

# Peak vs. recent comparison
peak_year = dui_trend.loc[dui_trend["dui_rate"].idxmax()]
recent = dui_trend[dui_trend["year"] >= 2018]["dui_rate"].mean()
print(f"Peak DUI rate: {peak_year['dui_rate']:.1%} in {int(peak_year['year'])}")
print(f"Average DUI rate 2018-2023: {recent:.1%}")

Geospatial Hotspot Mapping

Post-1999 records include decimal lat/lon, enabling geospatial analysis. The following snippet clusters crash locations into a density heatmap. With the full dataset you can identify specific highway segments, intersections, or counties with disproportionate fatality concentrations.

python

import folium
from folium.plugins import HeatMap

# Filter to post-1999 with valid coordinates
geo = accidents[
    (accidents["year"] >= 2000)
    & accidents["latitude"].between(24, 50)
    & accidents["longitude"].between(-130, -65)
][["latitude", "longitude", "fatals"]].dropna()

# Build heatmap — weight each point by fatalities
heat_data = geo[["latitude", "longitude", "fatals"]].values.tolist()

m = folium.Map(location=[39.5, -98.35], zoom_start=4, tiles="CartoDB dark_matter")
HeatMap(heat_data, radius=4, blur=6, max_zoom=8).add_to(m)
m.save("fars_heatmap.html")
print("Heatmap saved to fars_heatmap.html")

What to Build Next

With the full FARS dataset you can go further: train a gradient-boosted classifier to predict crash severity from environmental and road features, build a county-level fatality index for insurance underwriting, or construct a time-series model forecasting annual fatality rates by state. The three-table structure (accidents + vehicles + persons) enables rich join-based feature engineering that single-table datasets can't match.

The free sample contains 1,000 rows. The complete dataset ships all 3 tables (5.3M+ records) as CSV and Parquet with a commercial license for production use.

What's in the Dataset

Loading the Data

Fatality Trends Over Time

Drunk-Driving Analysis

Geospatial Hotspot Mapping

What to Build Next

Get the Full Dataset