OSHA Workplace Injury & Illness Dataset 2016–Present
OSHA Injury Tracking Application (ITA) — Form 300A Annual Summary
Under 29 CFR Part 1904, establishments with 20+ employees in high-hazard industries must electronically submit their OSHA Form 300A annual summary to the Injury Tracking Application. This dataset captures every establishment report from 2016 through 2023 — workplace injuries, illnesses, fatalities, days away from work, and computed incident rates (DART, TCIR) — cleaned, type-cast, and cross-year concatenated for longitudinal analysis.
Why not pull it directly from the OSHA portal?
You can — it's public domain. Here's what we saved you:
- ✓ 8 annual CSV files concatenated into one — OSHA publishes separate downloads per year; we merged and standardised column names that shifted between releases
- ✓ Typed numeric columns — raw files mix integers, empty strings, and formatting artifacts; every numeric field is cast to nullable Int64 with errors handled
- ✓ NAICS codes zero-padded and validated — raw files contain unpadded codes, mixed lengths, and occasional garbage values; we normalise to consistent 6-digit strings
- ✓ TCIR and DART rates computed — OSHA publishes the raw counts but not the standard incident rates; we compute both per 200,000 employee-hours
- ✓ Deduplication across years — OSHA occasionally publishes corrected files that overlap with prior releases; we identify and remove duplicate establishment-year rows
- ✓ Parquet output — OSHA only provides CSV; columnar Parquet is 5–10× faster for analytical queries
⏱ Skip the 8-file download-and-merge, column name reconciliation, and rate computation. Ready for analysis in minutes.
4M+
Establishment-Year Records
700K+
Unique Establishments
800+
NAICS Industries
8
Years of Data
Use Cases
Compare DART and TCIR rates across industries, establishment sizes, and geographies. Identify which NAICS sectors consistently exceed industry-average injury rates and by how much.
Use establishment-level injury and illness rates to inform workers' compensation underwriting models. Join against NAICS benchmarks to flag high-risk sectors or specific establishments.
Score industries on workplace safety for ESG portfolios. Track whether sectors are improving their injury rates year-over-year and identify companies with consistently high fatality rates.
Analyse which industries and establishment sizes are most frequently non-compliant. Study incident rate trends before and after major OSHA enforcement actions.
Research illness patterns (respiratory conditions, hearing loss, skin disorders) by industry. Link with BLS employment data to estimate population-level occupational disease burden.
Study how injury rates correlate with minimum wage laws, union density, or OSHA inspection intensity. Ideal for academic researchers, think tanks, and policy analysts.
Schema
Single table (osha_injuries), delivered as both osha_injuries.csv and osha_injuries.parquet. One row per establishment per survey year.
| Field | Type | Description |
|---|---|---|
| survey_year | int | Survey year the data covers (2016–2023) |
| estab_name | string | Establishment name as reported to OSHA |
| street_address | string | Street address of the establishment |
| city | string | City |
| state | string | US state (2-letter code) |
| zip3 | string | First 3 digits of ZIP code (for privacy, OSHA publishes ZIP3 only) |
| naics_code | string | 6-digit NAICS industry code (zero-padded) |
| industry_description | string | Industry description corresponding to NAICS code |
| size_class | string | Establishment size class (e.g., 1–10, 11–19, 20–249, 250+) |
| annual_average_employees | int | Annual average number of employees |
| total_hours_worked | int | Total employee hours worked during the year |
| no_injuries_illnesses | int | 1 if zero injuries/illnesses were recorded, 0 otherwise |
| total_deaths | int | Total work-related fatalities |
| total_dafw_cases | int | Cases with days away from work |
| total_djtr_cases | int | Cases with job transfer or restriction |
| total_other_cases | int | Other recordable cases (no time away/transfer) |
| total_dafw_days | int | Total days away from work |
| total_djtr_days | int | Total days of job transfer or restriction |
| total_injuries | int | Total injuries (sub-type of recordable cases) |
| total_skin_disorders | int | Occupational skin disorders |
| total_resp_conditions | int | Respiratory conditions |
| total_poisonings | int | Poisonings |
| total_hearing_loss | int | Hearing loss cases |
| total_other_illnesses | int | Other illness types |
| tcir_rate | float | Total Case Incident Rate per 200,000 hours worked (computed) |
| dart_rate | float | DART rate (Days Away, Restricted, Transfer) per 200,000 hours (computed) |
Quick Start
import pandas as pd
df = pd.read_parquet("osha_injuries.parquet")
# Records by year
print(df.groupby("survey_year").size().sort_index())
# Most dangerous industries (by median DART rate)
industry_dart = (
df[df["dart_rate"].notna()]
.groupby("industry_description")["dart_rate"]
.median()
.sort_values(ascending=False)
)
print(industry_dart.head(15))
# Establishments with the highest fatality counts
top_deaths = (
df[df["total_deaths"] > 0]
.groupby(["estab_name", "state"])["total_deaths"]
.sum()
.sort_values(ascending=False)
.head(20)
)
print(top_deaths)
# Industry DART rate trend over time
trend = (
df[df["dart_rate"].notna()]
.groupby(["survey_year", "naics_code"])["dart_rate"]
.median()
.reset_index()
)
# Zero-injury establishments by sector
zero_injury = df[df["no_injuries_illnesses"] == 1]
print(zero_injury["industry_description"].value_counts().head(10))
# Establishments with declining DART rates (improving safety)
pivot = df.pivot_table(
index=["estab_name", "state", "naics_code"],
columns="survey_year",
values="dart_rate",
aggfunc="first",
)Pairs Well With
Join OSHA injury rates against Bureau of Labor Statistics employment counts by NAICS code to compute population-weighted industry risk scores and estimate total affected workers.
OSHA publishes its enforcement and inspection records separately. Cross-reference ITA injury rates with inspection frequency and citations to study whether OSHA presence correlates with safer outcomes at the establishment level.
Pricing
$79
Full dataset — 4M+ establishment-year records (2016–2023), CSV + Parquet
Commercial License
Buy CompleteData Provenance
Source: U.S. Department of Labor — Occupational Safety and Health Administration (OSHA), Injury Tracking Application (ITA)
Portal: OSHA Establishment-Specific Injury and Illness Data
Coverage: 2016–2023. OSHA began mandatory electronic 300A submission in 2017 (for calendar year 2016 data). Annual files are published approximately 12 months after the survey year closes.
Who reports: Establishments with 20–249 employees in OSHA-defined high-hazard industries, plus all establishments with 250+ employees, are required to submit. Smaller or lower-hazard establishments may submit voluntarily.
ZIP3 privacy: OSHA publishes only the first 3 digits of the ZIP code (ZIP3) to protect establishment privacy in sparsely populated areas.
Computed fields: tcir_rate (Total Case Incident Rate) and dart_rate (Days Away, Restricted, Transfer Rate) are computed by ClarityStorm per the OSHA standard formula: (case count × 200,000) ÷ total hours worked. These are null when total hours worked is zero or missing.
Update frequency: OSHA releases new annual ITA data approximately once per year. Annual subscribers receive updates when ClarityStorm re-runs the pipeline.
License: OSHA ITA data is a US federal government work in the public domain. Paid tiers are licensed under the ClarityStorm Commercial Data License covering our pipeline and enrichment work (normalisation, cross-year concatenation, rate computation, Parquet conversion).
Need custom data cuts, multi-year snapshots, or bulk licensing?
Contact Sales