Dataset Changelog
Schema changes, quarterly data refreshes, and enrichment updates for all ClarityStorm datasets — newest first.
Refresh schedule: Annual subscribers receive updated data as upstream sources publish. Most federal datasets publish annually (Q1/Q2). Real-time sources (CFPB, NHTSA) are refreshed quarterly.
All Changes
Initial release
225 .csv.gz files merged — details, fatalities, and locations tables. Damage strings parsed to USD.
Initial release
Pre-joined NHTSA Complaints + Recalls + FARS fatal crashes. 33K+ vehicle-year profiles. One row per make/model/year.
Enrichment: computed fields added
Added complaint_category (AI-classified), component_group, and is_crash flag derived fields.
Enrichment: computed fields added
Added seriousness_label (AI-classified), reaction_category, and reporter_type_normalized.
Initial release
8 annual ITA files merged. DART & TCIR rates computed. NAICS zero-padded.
Initial release
Stable Parquet snapshot from live API. 14M+ complaints, 3.75M+ with narratives.
Enrichment: computed fields added
Added sentiment_score, product_category_normalized, and timely_response_flag.
Initial release
84 monthly BTS files merged. 35M+ flights. Year-partitioned Parquet.
Initial release
37 annual files merged. Facilities + releases linked. Environmental Justice linker included.
Initial release
4 quarterly zips merged, deduplicated across 7 tables. 1.5M+ reports.
Initial release
First publication. 22 years, 5 linked tables, ~6.3M crash events. CSV + Parquet.
Initial release
49 years, 3 normalized tables. Schema harmonized across 49 years of SAS format changes.
Initial release
~2.2M complaints. Component hierarchy parsed from free-text. CSV + Parquet.
Initial release
Pre-2010 + post-2010 files merged. 57+ years of recall campaigns.
Initial release
6 relational tables extracted from Access MDB. No Access required.
By Dataset
NJ Crash Records 2001–2022
First publication. 22 years, 5 linked tables, ~6.3M crash events. CSV + Parquet.
NHTSA FARS Fatal Crashes 1975–2023
49 years, 3 normalized tables. Schema harmonized across 49 years of SAS format changes.
NHTSA Vehicle Complaints 1995–Present
Added complaint_category (AI-classified), component_group, and is_crash flag derived fields.
~2.2M complaints. Component hierarchy parsed from free-text. CSV + Parquet.
NHTSA Vehicle Recalls 1967–Present
Pre-2010 + post-2010 files merged. 57+ years of recall campaigns.
NTSB Aviation Accidents 1982–Present
6 relational tables extracted from Access MDB. No Access required.
EPA Toxic Release Inventory 1987–Present
37 annual files merged. Facilities + releases linked. Environmental Justice linker included.
FDA FAERS Drug Adverse Events 2023
Added seriousness_label (AI-classified), reaction_category, and reporter_type_normalized.
4 quarterly zips merged, deduplicated across 7 tables. 1.5M+ reports.
OSHA Workplace Injury & Illness 2016–Present
8 annual ITA files merged. DART & TCIR rates computed. NAICS zero-padded.
CFPB Consumer Financial Complaints 2011–Present
Stable Parquet snapshot from live API. 14M+ complaints, 3.75M+ with narratives.
Added sentiment_score, product_category_normalized, and timely_response_flag.
DOT Airline On-Time Performance 2018–Present
84 monthly BTS files merged. 35M+ flights. Year-partitioned Parquet.
NOAA Storm Events Database 1950–Present
225 .csv.gz files merged — details, fatalities, and locations tables. Damage strings parsed to USD.
Vehicle Safety Profile — Complaints + Recalls + Fatal Crashes
Pre-joined NHTSA Complaints + Recalls + FARS fatal crashes. 33K+ vehicle-year profiles. One row per make/model/year.