Dataset Changelog
Schema changes, quarterly data refreshes, and enrichment updates for all ClarityStorm datasets — newest first.
Refresh schedule: Annual subscribers receive updated data as upstream sources publish. Most federal datasets publish annually (Q1/Q2). Real-time sources (CFPB, NHTSA) are refreshed quarterly.
All Changes
Initial release
2M+ county × crop × year indemnity records pre-joined with NOAA drought/weather. 130+ crops, all 50 states.
Initial release
55K+ county × year mortality records from CDC WONDER Compressed Mortality File. 18 years, 3,133 counties, mortality tiers, YPLL.
Pipeline re-run and production release
Full pipeline re-run with 58 monthly BTS files. 35M+ domestic flights, Stripe + S3 delivery live.
Initial release
2.7M+ paid NFIP claims since 1978. Flood zone classifications, coverage amounts, damage breakdowns, ZIP-level risk scores.
Initial release
225 .csv.gz files merged — details, fatalities, and locations tables. Damage strings parsed to USD.
Initial release
Pre-joined NHTSA Complaints + Recalls + FARS fatal crashes. 33K+ vehicle-year profiles. One row per make/model/year.
Enrichment: computed fields added
Added complaint_category (AI-classified), component_group, and is_crash flag derived fields.
Enrichment: computed fields added
Added seriousness_label (AI-classified), reaction_category, and reporter_type_normalized.
Initial release
8 annual ITA files merged. DART & TCIR rates computed. NAICS zero-padded.
Initial release
Stable Parquet snapshot from live API. 14M+ complaints, 3.75M+ with narratives.
Enrichment: computed fields added
Added sentiment_score, product_category_normalized, and timely_response_flag.
Initial release
84 monthly BTS files merged. 35M+ flights. Year-partitioned Parquet.
Initial release
37 annual files merged. Facilities + releases linked. Environmental Justice linker included.
Initial release
4 quarterly zips merged, deduplicated across 7 tables. 1.5M+ reports.
Initial release
First publication. 22 years, 5 linked tables, ~6.3M crash events. CSV + Parquet.
Initial release
49 years, 3 normalized tables. Schema harmonized across 49 years of SAS format changes.
Initial release
~2.2M complaints. Component hierarchy parsed from free-text. CSV + Parquet.
Initial release
Pre-2010 + post-2010 files merged. 57+ years of recall campaigns.
Initial release
6 relational tables extracted from Access MDB. No Access required.
By Dataset
NJ Crash Records 2001–2022
First publication. 22 years, 5 linked tables, ~6.3M crash events. CSV + Parquet.
NHTSA FARS Fatal Crashes 1975–2023
49 years, 3 normalized tables. Schema harmonized across 49 years of SAS format changes.
NHTSA Vehicle Complaints 1995–Present
Added complaint_category (AI-classified), component_group, and is_crash flag derived fields.
~2.2M complaints. Component hierarchy parsed from free-text. CSV + Parquet.
NHTSA Vehicle Recalls 1967–Present
Pre-2010 + post-2010 files merged. 57+ years of recall campaigns.
NTSB Aviation Accidents 1982–Present
6 relational tables extracted from Access MDB. No Access required.
EPA Toxic Release Inventory 1987–Present
37 annual files merged. Facilities + releases linked. Environmental Justice linker included.
FDA FAERS Drug Adverse Events 2023
Added seriousness_label (AI-classified), reaction_category, and reporter_type_normalized.
4 quarterly zips merged, deduplicated across 7 tables. 1.5M+ reports.
OSHA Workplace Injury & Illness 2016–Present
8 annual ITA files merged. DART & TCIR rates computed. NAICS zero-padded.
CFPB Consumer Financial Complaints 2011–Present
Stable Parquet snapshot from live API. 14M+ complaints, 3.75M+ with narratives.
Added sentiment_score, product_category_normalized, and timely_response_flag.
FEMA NFIP Flood Insurance Claims 1978–Present
2.7M+ paid NFIP claims since 1978. Flood zone classifications, coverage amounts, damage breakdowns, ZIP-level risk scores.
USDA Crop Insurance Indemnities + Weather 1989–2023
2M+ county × crop × year indemnity records pre-joined with NOAA drought/weather. 130+ crops, all 50 states.
CDC WONDER Mortality 1999–2016
55K+ county × year mortality records from CDC WONDER Compressed Mortality File. 18 years, 3,133 counties, mortality tiers, YPLL.
DOT Airline On-Time Performance 2018–Present
Full pipeline re-run with 58 monthly BTS files. 35M+ domestic flights, Stripe + S3 delivery live.
84 monthly BTS files merged. 35M+ flights. Year-partitioned Parquet.
NOAA Storm Events Database 1950–Present
225 .csv.gz files merged — details, fatalities, and locations tables. Damage strings parsed to USD.
Vehicle Safety Profile — Complaints + Recalls + Fatal Crashes
Pre-joined NHTSA Complaints + Recalls + FARS fatal crashes. 33K+ vehicle-year profiles. One row per make/model/year.