Blog
Python tutorials and analysis guides for our government datasets — from loading raw data to building production ML models.
Learn how to load and analyze 49 years of US fatal crash data from NHTSA FARS using Python and pandas. Discover fatality trends, DUI patterns, and geospatial hotspots.
Use 2.2M NHTSA vehicle complaint narratives to build an NLP defect detection model in Python. Cluster complaint text, predict recall likelihood, and profile high-risk makes.
Explore 40+ years of US aviation accident data from NTSB using Python. Analyze accident rates, aircraft types, injury patterns, and probable causes across 30K events.
Use Python to map 37 years of industrial toxic releases from the EPA Toxics Release Inventory. Analyze PFAS, carcinogen trends, and facility-level pollution across US states.
Load and analyze 20+ years of New Jersey traffic crash records using Python and pandas. Discover injury hotspots, seasonal patterns, and road-type risk factors in NJ's public crash dataset.
Use Python to analyze 400K+ NHTSA vehicle recall campaigns. Discover the brands, components, and model years most affected by safety recalls from 1966 to today.
Analyze 14M+ CFPB consumer financial complaints with Python. Discover which products generate the most complaints, which companies top the list, and how NLP unlocks the free-text narratives.
Use 35M+ BTS airline records to build a flight delay classifier in Python. Covers feature engineering from departure time and route data, training a gradient boosting model, and evaluating real-world prediction performance.
Explore 75 years of US storm damage data with Python. Analyze property and crop losses by event type, decade, and state using the NOAA Storm Events Database — 2M+ events from 1950 to present.
Analyze 4M+ OSHA establishment injury records with Python. Benchmark DART and TCIR rates by industry, track year-over-year safety trends, and identify the highest-risk sectors using the OSHA ITA dataset.
Explore 341K+ FAA wildlife strike reports covering 1990–2025 with Python. Identify the riskiest airports, most dangerous species, and costliest strike phases using the cleaned ClarityStorm FAA Wildlife Strikes dataset.
Explore 2.7M+ paid flood insurance claims from 1978 to today. Identify high-risk ZIP codes, track rising claim costs, and build flood exposure models in Python.
Analyze 2M+ USDA crop insurance indemnity records pre-joined with NOAA drought and weather data. Identify climate-vulnerable crops, counties, and loss trends in Python.
Explore 18 years of county-level death rates across 3,100+ US counties. Identify geographic mortality disparities and compute population-adjusted trends in Python.