Blog

Python tutorials and analysis guides for our government datasets — from loading raw data to building production ML models.

RSS

Learn how to load and analyze 49 years of US fatal crash data from NHTSA FARS using Python and pandas. Discover fatality trends, DUI patterns, and geospatial hotspots.

FARStraffic safetyPythonpandasRead tutorial →

Use 2.2M NHTSA vehicle complaint narratives to build an NLP defect detection model in Python. Cluster complaint text, predict recall likelihood, and profile high-risk makes.

NHTSANLPvehicle safetyscikit-learnRead tutorial →

Explore 40+ years of US aviation accident data from NTSB using Python. Analyze accident rates, aircraft types, injury patterns, and probable causes across 30K events.

NTSBaviation safetyPythonpandasRead tutorial →

Use Python to map 37 years of industrial toxic releases from the EPA Toxics Release Inventory. Analyze PFAS, carcinogen trends, and facility-level pollution across US states.

EPA TRIenvironmentPFASgeospatialRead tutorial →

Load and analyze 20+ years of New Jersey traffic crash records using Python and pandas. Discover injury hotspots, seasonal patterns, and road-type risk factors in NJ's public crash dataset.

NJ crash datatraffic safetyPythonpandasRead tutorial →

Use Python to analyze 400K+ NHTSA vehicle recall campaigns. Discover the brands, components, and model years most affected by safety recalls from 1966 to today.

NHTSAvehicle recallsauto safetyPythonRead tutorial →

Analyze 14M+ CFPB consumer financial complaints with Python. Discover which products generate the most complaints, which companies top the list, and how NLP unlocks the free-text narratives.

CFPBfintechNLPPythonRead tutorial →

Use 35M+ BTS airline records to build a flight delay classifier in Python. Covers feature engineering from departure time and route data, training a gradient boosting model, and evaluating real-world prediction performance.

DOTairlinesmachine learningPythonRead tutorial →

Explore 75 years of US storm damage data with Python. Analyze property and crop losses by event type, decade, and state using the NOAA Storm Events Database — 2M+ events from 1950 to present.

NOAAclimatePythonpandasRead tutorial →

Analyze 4M+ OSHA establishment injury records with Python. Benchmark DART and TCIR rates by industry, track year-over-year safety trends, and identify the highest-risk sectors using the OSHA ITA dataset.

OSHAworkplace safetyPythonpandasRead tutorial →

Explore 341K+ FAA wildlife strike reports covering 1990–2025 with Python. Identify the riskiest airports, most dangerous species, and costliest strike phases using the cleaned ClarityStorm FAA Wildlife Strikes dataset.

FAAaviation safetywildlife strikesPythonRead tutorial →

Explore 2.7M+ paid flood insurance claims from 1978 to today. Identify high-risk ZIP codes, track rising claim costs, and build flood exposure models in Python.

FEMAflood riskinsuranceclimateRead tutorial →

Analyze 2M+ USDA crop insurance indemnity records pre-joined with NOAA drought and weather data. Identify climate-vulnerable crops, counties, and loss trends in Python.

USDAagriculturecrop insuranceclimateRead tutorial →

Explore 18 years of county-level death rates across 3,100+ US counties. Identify geographic mortality disparities and compute population-adjusted trends in Python.

CDC WONDERmortalitypublic healthepidemiologyRead tutorial →