NTSB Aviation Accidents 1982–Present
National Transportation Safety Board Aviation Accident Database
Every aviation accident and incident investigated by the NTSB since 1982 — cleaned, structured, and ready for analysis. Includes 6 relational tables covering events, aircraft, probable cause narratives, causal findings, crew experience, and engines. 30K+ accidents across four decades of US aviation.
Why not just download it from NTSB?
You can — it's public domain. Here's what we saved you:
- ✓ Extracted from Microsoft Access MDB format — the only distribution format NTSB provides. No Access license required.
- ✓ All 6 relational tables extracted and output as clean CSV + Parquet with consistent join keys on
ev_id - ✓ Standardized date/time fields and normalized column names across the raw Access schema
- ✓ Added derived fields —
ev_year,total_injuries(sum of severity levels), coordinate validation - ✓ Parquet output — works directly in pandas, DuckDB, Spark without any format conversion
⏱ Skip ~4–5 hours of setup. Getting data out of a proprietary Access file is the first obstacle — we handled it.
What you'd need to do yourself ↓
- Download
avall.zipfrom the NTSB aviation data portal - Install
mdb-tools(Linux) or use Microsoft Access (Windows-only) to read the MDB file - Extract all 6 tables individually and reconcile null handling across the raw Access types
- Parse dates and normalize column names — the raw schema uses NTSB internal naming conventions
- Build your own Parquet conversion and validate join integrity across 6 tables
~30K
Accidents
40+
Years
7
Tables
~28K
Narratives
Included Tables
All tables are joined on ev_id (event identifier).
~30K rows
One row per accident/incident — dates, location, injury counts, weather, phase of flight.
~31K rows
Aircraft involved in each event — make, model, year, engine count, damage level.
~28K rows
Full narrative text and probable cause statements written by NTSB investigators.
~71K rows
Causal and contributing findings per event — factor codes and descriptions.
~32K rows
Pilot and crew records — certificate level, total hours, hours in type.
~28K rows
Engine details per aircraft — type, manufacturer, horsepower.
Use Cases
Analyze accident rates by aircraft type, phase of flight, weather condition, and geography over four decades of US aviation history.
Mine 300K+ findings records to identify top contributing factors — pilot error, mechanical failure, weather — and how they interact.
80K free-text probable cause statements written by NTSB investigators — a rich corpus for NLP, summarization, and classification tasks.
Plot accident locations with lat/lon coordinates to identify high-risk corridors, airports, and terrain features.
Correlate crew experience (total hours, hours in type) with accident outcomes to build pilot risk scoring models.
Score aircraft makes, models, and operation types by historical accident rates, injury severity, and total loss frequency.
Events Table Schema
Primary table: ntsb_aviation_events.parquet / ntsb_aviation_events.csv— one row per accident. Join to other tables on ev_id.
| Field | Type | Description |
|---|---|---|
| ev_id | string | NTSB event identifier (primary key) |
| ntsb_no | string | NTSB accident number |
| ev_date | string | Accident date (YYYY-MM-DD) |
| ev_year | int | Accident year (derived) |
| ev_time | string | Accident time (HH:MM, local) |
| ev_city | string | City of occurrence |
| ev_state | string | US state (2-letter code) |
| ev_country | string | Country code |
| ev_type | string | Accident vs. incident |
| ev_highest_injury | string | Highest injury level (fatal/serious/minor/none) |
| inj_tot_f | int | Total fatalities |
| inj_tot_s | int | Total serious injuries |
| inj_tot_m | int | Total minor injuries |
| inj_tot_n | int | Total uninjured |
| total_injuries | int | Sum of fatal + serious + minor (derived) |
| acft_fire | int | Aircraft fire involved (1/0) |
| acft_explode | int | Aircraft explosion (1/0) |
| latitude | float | Decimal latitude of accident site |
| longitude | float | Decimal longitude of accident site |
| apt_id | string | Nearest airport identifier (FAA/ICAO) |
| apt_name | string | Nearest airport name |
| wx_cond_basic | string | Basic weather conditions (VMC/IMC) |
| light_cond | string | Light conditions (day/night/dusk/dawn) |
| phase_flt_spec | string | Phase of flight (takeoff/cruise/landing, etc.) |
Quick Start
import pandas as pd
events = pd.read_parquet("ntsb_aviation_events.parquet")
narratives = pd.read_parquet("ntsb_aviation_narratives.parquet")
# Fatal accidents by year
fatal = events[events["inj_tot_f"] > 0]
print(fatal.groupby("ev_year")["inj_tot_f"].sum())
# Accidents by phase of flight
print(events["phase_flt_spec"].value_counts().head(10))
# Join events with probable cause narratives
df = events.merge(narratives[["ev_id", "narr_cause"]], on="ev_id", how="left")
print(df[["ev_date", "ev_state", "ev_highest_injury", "narr_cause"]].head(5))Pricing
$79
Full dataset — all tables (events, aircraft, narratives, findings), CSV + Parquet
Commercial License
Buy CompleteData Provenance
Source: National Transportation Safety Board (NTSB), Aviation Accident Database
Portal: NTSB Aviation Data
Format: Source is a Microsoft Access MDB file (avall.zip). Our pipeline extracts all tables using mdb-tools, normalises column names, parses dates, and outputs clean CSV + Parquet.
Update frequency: NTSB updates the database as investigations are completed. Annual subscribers receive yearly refreshes.
License: NTSB accident data is a US federal government work in the public domain. Paid tiers are licensed under the ClarityStorm Commercial Data License covering our pipeline and enrichment work (table extraction, date standardisation, coordinate parsing, derived fields).
Need custom data cuts, API access, or bulk licensing?
Contact Sales