← All Datasets

OSHA Workplace Injury & Illness Dataset 2016–Present

OSHA Injury Tracking Application (ITA) — Form 300A Annual Summary

Under 29 CFR Part 1904, establishments with 20+ employees in high-hazard industries must electronically submit their OSHA Form 300A annual summary to the Injury Tracking Application. This dataset captures every establishment report from 2016 through 2023 — workplace injuries, illnesses, fatalities, days away from work, and computed incident rates (DART, TCIR) — cleaned, type-cast, and cross-year concatenated for longitudinal analysis.

Sample: Public DomainPaid: Commercial LicenseCSV + ParquetAnnual Updates2016–2023DART & TCIR Computed

Why not pull it directly from the OSHA portal?

You can — it's public domain. Here's what we saved you:

  • 8 annual CSV files concatenated into one — OSHA publishes separate downloads per year; we merged and standardised column names that shifted between releases
  • Typed numeric columns — raw files mix integers, empty strings, and formatting artifacts; every numeric field is cast to nullable Int64 with errors handled
  • NAICS codes zero-padded and validated — raw files contain unpadded codes, mixed lengths, and occasional garbage values; we normalise to consistent 6-digit strings
  • TCIR and DART rates computed — OSHA publishes the raw counts but not the standard incident rates; we compute both per 200,000 employee-hours
  • Deduplication across years — OSHA occasionally publishes corrected files that overlap with prior releases; we identify and remove duplicate establishment-year rows
  • Parquet output — OSHA only provides CSV; columnar Parquet is 5–10× faster for analytical queries

⏱ Skip the 8-file download-and-merge, column name reconciliation, and rate computation. Ready for analysis in minutes.

4M+

Establishment-Year Records

700K+

Unique Establishments

800+

NAICS Industries

8

Years of Data

Use Cases

Workplace Safety Benchmarking

Compare DART and TCIR rates across industries, establishment sizes, and geographies. Identify which NAICS sectors consistently exceed industry-average injury rates and by how much.

Insurance Underwriting

Use establishment-level injury and illness rates to inform workers' compensation underwriting models. Join against NAICS benchmarks to flag high-risk sectors or specific establishments.

ESG & Responsible Investment

Score industries on workplace safety for ESG portfolios. Track whether sectors are improving their injury rates year-over-year and identify companies with consistently high fatality rates.

OSHA Compliance Analytics

Analyse which industries and establishment sizes are most frequently non-compliant. Study incident rate trends before and after major OSHA enforcement actions.

Occupational Health Research

Research illness patterns (respiratory conditions, hearing loss, skin disorders) by industry. Link with BLS employment data to estimate population-level occupational disease burden.

Labor & Policy Analysis

Study how injury rates correlate with minimum wage laws, union density, or OSHA inspection intensity. Ideal for academic researchers, think tanks, and policy analysts.

Schema

Single table (osha_injuries), delivered as both osha_injuries.csv and osha_injuries.parquet. One row per establishment per survey year.

FieldTypeDescription
survey_yearintSurvey year the data covers (2016–2023)
estab_namestringEstablishment name as reported to OSHA
street_addressstringStreet address of the establishment
citystringCity
statestringUS state (2-letter code)
zip3stringFirst 3 digits of ZIP code (for privacy, OSHA publishes ZIP3 only)
naics_codestring6-digit NAICS industry code (zero-padded)
industry_descriptionstringIndustry description corresponding to NAICS code
size_classstringEstablishment size class (e.g., 1–10, 11–19, 20–249, 250+)
annual_average_employeesintAnnual average number of employees
total_hours_workedintTotal employee hours worked during the year
no_injuries_illnessesint1 if zero injuries/illnesses were recorded, 0 otherwise
total_deathsintTotal work-related fatalities
total_dafw_casesintCases with days away from work
total_djtr_casesintCases with job transfer or restriction
total_other_casesintOther recordable cases (no time away/transfer)
total_dafw_daysintTotal days away from work
total_djtr_daysintTotal days of job transfer or restriction
total_injuriesintTotal injuries (sub-type of recordable cases)
total_skin_disordersintOccupational skin disorders
total_resp_conditionsintRespiratory conditions
total_poisoningsintPoisonings
total_hearing_lossintHearing loss cases
total_other_illnessesintOther illness types
tcir_ratefloatTotal Case Incident Rate per 200,000 hours worked (computed)
dart_ratefloatDART rate (Days Away, Restricted, Transfer) per 200,000 hours (computed)

Quick Start

import pandas as pd

df = pd.read_parquet("osha_injuries.parquet")

# Records by year
print(df.groupby("survey_year").size().sort_index())

# Most dangerous industries (by median DART rate)
industry_dart = (
    df[df["dart_rate"].notna()]
    .groupby("industry_description")["dart_rate"]
    .median()
    .sort_values(ascending=False)
)
print(industry_dart.head(15))

# Establishments with the highest fatality counts
top_deaths = (
    df[df["total_deaths"] > 0]
    .groupby(["estab_name", "state"])["total_deaths"]
    .sum()
    .sort_values(ascending=False)
    .head(20)
)
print(top_deaths)

# Industry DART rate trend over time
trend = (
    df[df["dart_rate"].notna()]
    .groupby(["survey_year", "naics_code"])["dart_rate"]
    .median()
    .reset_index()
)

# Zero-injury establishments by sector
zero_injury = df[df["no_injuries_illnesses"] == 1]
print(zero_injury["industry_description"].value_counts().head(10))

# Establishments with declining DART rates (improving safety)
pivot = df.pivot_table(
    index=["estab_name", "state", "naics_code"],
    columns="survey_year",
    values="dart_rate",
    aggfunc="first",
)

Pairs Well With

External: BLS Occupational Employment Statistics

Join OSHA injury rates against Bureau of Labor Statistics employment counts by NAICS code to compute population-weighted industry risk scores and estimate total affected workers.

External: OSHA Enforcement / Inspection Data

OSHA publishes its enforcement and inspection records separately. Cross-reference ITA injury rates with inspection frequency and citations to study whether OSHA presence correlates with safer outcomes at the establishment level.

Pricing

Sample

Free

1,000 rows (CSV) + schema docs

Public Domain

Download Sample
Complete

$79

Full dataset — 4M+ establishment-year records (2016–2023), CSV + Parquet

Commercial License

Buy Complete
Annual

$149/yr

Full dataset + annual updates as OSHA releases new ITA data

Commercial License

Subscribe

Data Provenance

Source: U.S. Department of Labor — Occupational Safety and Health Administration (OSHA), Injury Tracking Application (ITA)

Portal: OSHA Establishment-Specific Injury and Illness Data

Coverage: 2016–2023. OSHA began mandatory electronic 300A submission in 2017 (for calendar year 2016 data). Annual files are published approximately 12 months after the survey year closes.

Who reports: Establishments with 20–249 employees in OSHA-defined high-hazard industries, plus all establishments with 250+ employees, are required to submit. Smaller or lower-hazard establishments may submit voluntarily.

ZIP3 privacy: OSHA publishes only the first 3 digits of the ZIP code (ZIP3) to protect establishment privacy in sparsely populated areas.

Computed fields: tcir_rate (Total Case Incident Rate) and dart_rate (Days Away, Restricted, Transfer Rate) are computed by ClarityStorm per the OSHA standard formula: (case count × 200,000) ÷ total hours worked. These are null when total hours worked is zero or missing.

Update frequency: OSHA releases new annual ITA data approximately once per year. Annual subscribers receive updates when ClarityStorm re-runs the pipeline.

License: OSHA ITA data is a US federal government work in the public domain. Paid tiers are licensed under the ClarityStorm Commercial Data License covering our pipeline and enrichment work (normalisation, cross-year concatenation, rate computation, Parquet conversion).

Need custom data cuts, multi-year snapshots, or bulk licensing?

Contact Sales