← All Datasets

NTSB Aviation Accidents 1982–Present

National Transportation Safety Board Aviation Accident Database

Every aviation accident and incident investigated by the NTSB since 1982 — cleaned, structured, and ready for analysis. Includes 6 relational tables covering events, aircraft, probable cause narratives, causal findings, crew experience, and engines. 30K+ accidents across four decades of US aviation.

Sample: Public DomainPaid: Commercial LicenseCSV + Parquet~30K accidents1982–Present6 tables

~30K

Accidents

40+

Years

7

Tables

~28K

Narratives

Included Tables

All tables are joined on ev_id (event identifier).

events

~30K rows

One row per accident/incident — dates, location, injury counts, weather, phase of flight.

aircraft

~31K rows

Aircraft involved in each event — make, model, year, engine count, damage level.

narratives

~28K rows

Full narrative text and probable cause statements written by NTSB investigators.

findings

~71K rows

Causal and contributing findings per event — factor codes and descriptions.

flight_crew

~32K rows

Pilot and crew records — certificate level, total hours, hours in type.

engines

~28K rows

Engine details per aircraft — type, manufacturer, horsepower.

Use Cases

Aviation Safety Research

Analyze accident rates by aircraft type, phase of flight, weather condition, and geography over four decades of US aviation history.

Causal Factor Analysis

Mine 300K+ findings records to identify top contributing factors — pilot error, mechanical failure, weather — and how they interact.

NLP on Investigator Narratives

80K free-text probable cause statements written by NTSB investigators — a rich corpus for NLP, summarization, and classification tasks.

Geospatial Risk Mapping

Plot accident locations with lat/lon coordinates to identify high-risk corridors, airports, and terrain features.

Pilot Training & Risk Models

Correlate crew experience (total hours, hours in type) with accident outcomes to build pilot risk scoring models.

Insurance Actuarial Modeling

Score aircraft makes, models, and operation types by historical accident rates, injury severity, and total loss frequency.

Events Table Schema

Primary table: ntsb_aviation_events.parquet / ntsb_aviation_events.csv— one row per accident. Join to other tables on ev_id.

FieldTypeDescription
ev_idstringNTSB event identifier (primary key)
ntsb_nostringNTSB accident number
ev_datestringAccident date (YYYY-MM-DD)
ev_yearintAccident year (derived)
ev_timestringAccident time (HH:MM, local)
ev_citystringCity of occurrence
ev_statestringUS state (2-letter code)
ev_countrystringCountry code
ev_typestringAccident vs. incident
ev_highest_injurystringHighest injury level (fatal/serious/minor/none)
inj_tot_fintTotal fatalities
inj_tot_sintTotal serious injuries
inj_tot_mintTotal minor injuries
inj_tot_nintTotal uninjured
total_injuriesintSum of fatal + serious + minor (derived)
acft_fireintAircraft fire involved (1/0)
acft_explodeintAircraft explosion (1/0)
latitudefloatDecimal latitude of accident site
longitudefloatDecimal longitude of accident site
apt_idstringNearest airport identifier (FAA/ICAO)
apt_namestringNearest airport name
wx_cond_basicstringBasic weather conditions (VMC/IMC)
light_condstringLight conditions (day/night/dusk/dawn)
phase_flt_specstringPhase of flight (takeoff/cruise/landing, etc.)

Quick Start

import pandas as pd

events = pd.read_parquet("ntsb_aviation_events.parquet")
narratives = pd.read_parquet("ntsb_aviation_narratives.parquet")

# Fatal accidents by year
fatal = events[events["inj_tot_f"] > 0]
print(fatal.groupby("ev_year")["inj_tot_f"].sum())

# Accidents by phase of flight
print(events["phase_flt_spec"].value_counts().head(10))

# Join events with probable cause narratives
df = events.merge(narratives[["ev_id", "narr_cause"]], on="ev_id", how="left")
print(df[["ev_date", "ev_state", "ev_highest_injury", "narr_cause"]].head(5))

Pricing

Sample

Free

1,000 rows from events table (CSV) + schema docs

Public Domain

Download Sample
Complete

$79

Full dataset — all tables (events, aircraft, narratives, findings), CSV + Parquet

Commercial License

Buy Complete
Annual

$149/yr

Full dataset + annual updates as NTSB adds new accident records

Commercial License

Subscribe

Data Provenance

Source: National Transportation Safety Board (NTSB), Aviation Accident Database

Portal: NTSB Aviation Data

Format: Source is a Microsoft Access MDB file (avall.zip). Our pipeline extracts all tables using mdb-tools, normalises column names, parses dates, and outputs clean CSV + Parquet.

Update frequency: NTSB updates the database as investigations are completed. Annual subscribers receive yearly refreshes.

License: NTSB accident data is a US federal government work in the public domain. Paid tiers are licensed under the ClarityStorm Commercial Data License covering our pipeline and enrichment work (table extraction, date standardisation, coordinate parsing, derived fields).

Need custom data cuts, API access, or bulk licensing?

Contact Sales