Datasets
Analysis-ready datasets built from public government sources — normalized schemas, Parquet format, derived fields, and cross-dataset linking done for you. Free samples available; commercial licenses for full datasets.
Dataset Bundles — save up to 40%
Themed bundles: Transportation Safety, Aviation, ESG, Healthcare & Finance, or all datasets.
All datasets refreshed quarterly. View changelog →
22 yrs of NJ crash data — 5 linked tables, schema normalized
Records: ~6.3M crash events
Coverage: 22 years, 21 counties
Format: CSV + Parquet
Updated Q1 2025
49 yrs of schema changes harmonized — all 50 states
Records: ~5.3M records across 3 tables
Coverage: 49 years, all 50 states
Format: CSV + Parquet
Updated Q1 2025
Component hierarchy parsed — pairs with Recalls for full defect pipeline
Records: ~2.2M complaints
Coverage: 30+ years, all makes & models
Format: CSV + Parquet
Updated Q1 2025
Extracted from Access MDB — 6 relational tables, no Access needed
Records: ~30K accidents across 6 tables
Coverage: 40+ years, all 50 states
Format: CSV + Parquet
Updated Q1 2025
Pre-2010 + post-2010 files merged — pairs with Complaints
Records: 50K+ recall campaigns
Coverage: 57+ years, all makes & models
Format: CSV + Parquet
Updated Q1 2025
37 annual files merged — facilities + releases linked, ESG-ready
Records: 3M+ release records across 2 tables
Coverage: 37 years, all US states & territories
Format: CSV + Parquet
Updated Q2 2025
4 quarterly zips merged, deduplicated across 7 tables
Records: 1.5M+ reports across 7 tables
Coverage: 2023 (Q1–Q4), deduplicated
Format: CSV + Parquet
Updated Q2 2025
8 annual ITA files merged — DART & TCIR rates computed, NAICS normalized
Records: 4M+ establishment-year records
Coverage: 8 years, all US states
Format: CSV + Parquet
Updated Q3 2025
Stable Parquet snapshot from live API — 3.75M+ narratives preserved
Records: 14M+ complaints, 3.75M+ with narratives
Coverage: 13+ years, all US states
Format: CSV + Parquet
Updated Q3 2025
84 monthly BTS files merged — delay causes, cancellations, year-partitioned Parquet
Records: 35M+ domestic flights
Coverage: 7 years, 20+ carriers
Format: CSV + Parquet
Updated Q2 2026
225 .csv.gz files merged — damage $USD parsed, 3 linked tables, 74 years of weather
Records: 2M+ storm events
Coverage: 74 years, all US states
Format: CSV + Parquet
Updated Q4 2025
The only pre-joined vehicle safety dataset — NHTSA + FARS cross-linked by make/model/year
Records: 33K+ vehicle-year profiles
Coverage: 1985–2025, all makes
Format: CSV + Parquet
Updated Q4 2025
2.7M+ paid claims enriched — ZIP risk scores, coverage ratios, building characteristics
Records: 2.7M+ paid claims
Coverage: 48+ years, all 50 states
Format: CSV + Parquet
Updated Q1 2026
234 US navigable waterway locks — physical specs, chamber dimensions, gate types, capacity tiers
Records: 234 locks, 56 fields
Coverage: 26 states, 60+ waterways, all USACE districts
Format: CSV + Parquet
Every reported strike since 1990 — species risk tiers, damage severity, engine ingestion flags
Records: 341,090 strike reports, 113 fields
Coverage: 36+ years, 2,764 airports, 952 species
Format: CSV + Parquet
18 yrs of county-level deaths — mortality tiers, YPLL, crude rates, 3,100+ counties, all 50 states + DC
Records: 55K+ county × year mortality records
Coverage: 18 years, 3,100+ counties, all 50 states + DC
Format: CSV + Parquet
Updated Q2 2026
35 yrs of crop loss records pre-joined with NOAA drought/weather by county — 130+ crops, all 50 states
Records: 4.2M+ indemnity records across 3 tables
Coverage: 35 years, all US crop counties
Format: CSV + Parquet
Updated Q2 2026