NTSB Aviation Accidents 1982–Present
National Transportation Safety Board Aviation Accident Database
Every aviation accident and incident investigated by the NTSB since 1982 — cleaned, structured, and ready for analysis. Includes 6 relational tables covering events, aircraft, probable cause narratives, causal findings, crew experience, and engines. 30K+ accidents across four decades of US aviation.
~30K
Accidents
40+
Years
7
Tables
~28K
Narratives
Included Tables
All tables are joined on ev_id (event identifier).
~30K rows
One row per accident/incident — dates, location, injury counts, weather, phase of flight.
~31K rows
Aircraft involved in each event — make, model, year, engine count, damage level.
~28K rows
Full narrative text and probable cause statements written by NTSB investigators.
~71K rows
Causal and contributing findings per event — factor codes and descriptions.
~32K rows
Pilot and crew records — certificate level, total hours, hours in type.
~28K rows
Engine details per aircraft — type, manufacturer, horsepower.
Use Cases
Analyze accident rates by aircraft type, phase of flight, weather condition, and geography over four decades of US aviation history.
Mine 300K+ findings records to identify top contributing factors — pilot error, mechanical failure, weather — and how they interact.
80K free-text probable cause statements written by NTSB investigators — a rich corpus for NLP, summarization, and classification tasks.
Plot accident locations with lat/lon coordinates to identify high-risk corridors, airports, and terrain features.
Correlate crew experience (total hours, hours in type) with accident outcomes to build pilot risk scoring models.
Score aircraft makes, models, and operation types by historical accident rates, injury severity, and total loss frequency.
Events Table Schema
Primary table: ntsb_aviation_events.parquet / ntsb_aviation_events.csv— one row per accident. Join to other tables on ev_id.
| Field | Type | Description |
|---|---|---|
| ev_id | string | NTSB event identifier (primary key) |
| ntsb_no | string | NTSB accident number |
| ev_date | string | Accident date (YYYY-MM-DD) |
| ev_year | int | Accident year (derived) |
| ev_time | string | Accident time (HH:MM, local) |
| ev_city | string | City of occurrence |
| ev_state | string | US state (2-letter code) |
| ev_country | string | Country code |
| ev_type | string | Accident vs. incident |
| ev_highest_injury | string | Highest injury level (fatal/serious/minor/none) |
| inj_tot_f | int | Total fatalities |
| inj_tot_s | int | Total serious injuries |
| inj_tot_m | int | Total minor injuries |
| inj_tot_n | int | Total uninjured |
| total_injuries | int | Sum of fatal + serious + minor (derived) |
| acft_fire | int | Aircraft fire involved (1/0) |
| acft_explode | int | Aircraft explosion (1/0) |
| latitude | float | Decimal latitude of accident site |
| longitude | float | Decimal longitude of accident site |
| apt_id | string | Nearest airport identifier (FAA/ICAO) |
| apt_name | string | Nearest airport name |
| wx_cond_basic | string | Basic weather conditions (VMC/IMC) |
| light_cond | string | Light conditions (day/night/dusk/dawn) |
| phase_flt_spec | string | Phase of flight (takeoff/cruise/landing, etc.) |
Quick Start
import pandas as pd
events = pd.read_parquet("ntsb_aviation_events.parquet")
narratives = pd.read_parquet("ntsb_aviation_narratives.parquet")
# Fatal accidents by year
fatal = events[events["inj_tot_f"] > 0]
print(fatal.groupby("ev_year")["inj_tot_f"].sum())
# Accidents by phase of flight
print(events["phase_flt_spec"].value_counts().head(10))
# Join events with probable cause narratives
df = events.merge(narratives[["ev_id", "narr_cause"]], on="ev_id", how="left")
print(df[["ev_date", "ev_state", "ev_highest_injury", "narr_cause"]].head(5))Pricing
$79
Full dataset — all tables (events, aircraft, narratives, findings), CSV + Parquet
Commercial License
Buy CompleteData Provenance
Source: National Transportation Safety Board (NTSB), Aviation Accident Database
Portal: NTSB Aviation Data
Format: Source is a Microsoft Access MDB file (avall.zip). Our pipeline extracts all tables using mdb-tools, normalises column names, parses dates, and outputs clean CSV + Parquet.
Update frequency: NTSB updates the database as investigations are completed. Annual subscribers receive yearly refreshes.
License: NTSB accident data is a US federal government work in the public domain. Paid tiers are licensed under the ClarityStorm Commercial Data License covering our pipeline and enrichment work (table extraction, date standardisation, coordinate parsing, derived fields).
Need custom data cuts, API access, or bulk licensing?
Contact Sales