Every Aviation Accident Since 1982: What NTSB Data Reveals

Commercial aviation is statistically the safest form of transportation per mile traveled — yet accidents still happen, and when they do, the NTSB investigates exhaustively. The National Transportation Safety Board has recorded every US aviation accident since 1982 in its accident database: over 30,000 events covering general aviation, commercial carriers, and everything in between. Each record links the accident event to the aircraft involved, the injuries sustained, the probable causes found, and in many cases a full narrative of what went wrong.

This tutorial walks through loading the ClarityStorm NTSB Aviation dataset, computing accident rate trends, profiling high-risk aircraft categories, and extracting patterns from the findings table.

Six Tables, One Accident Database

The NTSB database is relational. ClarityStorm ships all six core tables with consistent schema and primary key linkage: events (one row per accident), aircraft (one row per aircraft in each event), engines, narratives (full-text cause descriptions), findings (structured probable cause codes), and occurrences (phase-of-flight and occurrence type flags).

events: ~30K accident records, 1982–present
aircraft: aircraft type, category, make, model, amateur-built flag
narratives: full-text probable cause and factual narratives
findings: structured cause codes (human factor, mechanical, environmental)
occurrences: phase-of-flight and occurrence sequence
engines: engine type, number, and horsepower

Loading and Joining the Tables

python

import pandas as pd

events   = pd.read_parquet("ntsb_events.parquet")
aircraft = pd.read_parquet("ntsb_aircraft.parquet")
findings = pd.read_parquet("ntsb_findings.parquet")

print(f"Events:   {len(events):,}")
print(f"Aircraft: {len(aircraft):,}")
print(f"Findings: {len(findings):,}")

# Primary join: events ← aircraft via ev_id
df = events.merge(aircraft, on="ev_id", how="left")
print(f"Joined rows: {len(df):,}")

# Quick profile
print(df.groupby("ev_highest_injury")["ev_id"].count())

Accident Rate Trends

The story the NTSB data tells is one of consistent improvement. General aviation accident rates have fallen sharply since the 1980s, driven by improved avionics (GPS, terrain-awareness systems), better pilot training standards, and airspace modernization. The data makes this visible.

python

import matplotlib.pyplot as plt

annual = (
    events.groupby("ev_year")
    .agg(
        total_events=("ev_id", "count"),
        fatal_events=("ev_highest_injury", lambda x: (x == "FATAL").sum()),
    )
    .reset_index()
)
annual["fatal_rate"] = annual["fatal_events"] / annual["total_events"]

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8), sharex=True)

ax1.bar(annual["ev_year"], annual["total_events"], color="#0ea5e9", alpha=0.7)
ax1.set_title("Total Aviation Accidents per Year (NTSB 1982–Present)")
ax1.set_ylabel("Accidents")

ax2.plot(annual["ev_year"], annual["fatal_rate"], color="#ef4444", linewidth=2)
ax2.fill_between(annual["ev_year"], annual["fatal_rate"], alpha=0.2, color="#ef4444")
ax2.set_title("Fatal Accident Rate")
ax2.set_ylabel("Fraction Fatal")
ax2.set_xlabel("Year")

plt.tight_layout()
plt.savefig("ntsb_trend.png", dpi=150)

Profiling High-Risk Aircraft Categories

Not all aircraft categories carry equal risk. Single-engine piston aircraft dominate the accident count by volume (they also dominate the general aviation fleet), but rotorcraft and amateur-built aircraft show disproportionate fatal-accident rates. The aircraft table makes this breakdown straightforward.

python

# Fatal accident rate by aircraft category
category_risk = (
    df.groupby("acft_category")
    .agg(
        total=("ev_id", "count"),
        fatal=("ev_highest_injury", lambda x: (x == "FATAL").sum()),
    )
    .assign(fatal_rate=lambda d: d["fatal"] / d["total"])
    .sort_values("fatal_rate", ascending=False)
    .query("total >= 100")  # filter low-sample categories
)

print(category_risk[["total", "fatal", "fatal_rate"]].to_string())

Mining Probable Cause Findings

The findings table encodes the NTSB's structured causal analysis for each accident: human performance issues, mechanical failures, environmental factors, and their relationships. Aggregating findings lets you quantify how often pilot error, maintenance failures, or weather conditions appear as causes — and how those proportions have shifted over four decades.

python

# Top finding categories
cause_counts = (
    findings.groupby("finding_code")["ev_id"]
    .count()
    .sort_values(ascending=False)
    .head(20)
    .reset_index()
    .rename(columns={"ev_id": "occurrences"})
)
print(cause_counts)

# Merge with narratives for a specific finding
pilot_error = findings[findings["finding_code"].str.startswith("22")]  # human performance
print(f"Human performance findings: {len(pilot_error):,}")

The free sample includes 1,000 rows from the events table. The complete dataset ships all six tables (events, aircraft, narratives, findings, occurrences, engines) as CSV and Parquet.

Six Tables, One Accident Database

Loading and Joining the Tables

Accident Rate Trends

Profiling High-Risk Aircraft Categories

Mining Probable Cause Findings

Get the Full Dataset