NJ Crash Records 2001-2022
AI-Ready Structured Dataset
22 years of New Jersey traffic crash records, cleaned and standardised for machine learning. Sourced directly from the NJ Department of Transportation NJTR-1 crash report system. Over 6.3 million records across all 21 NJ counties.
Why not just download it from NJ DOT?
You can — it's free. Here's what we saved you:
- ✓ Merged 22 separate annual CSV releases with shifting column names and types into one consistent file
- ✓ Normalized schema — dates, severity flags, county names, and boolean fields standardized across all years
- ✓ Joined 5 relational tables (Accidents, Vehicles, Drivers, Occupants, Pedestrians) with consistent keys
- ✓ Added derived fields —
day_of_week,casualty_count, coordinate validation for 2013+ records - ✓ Parquet output — NJ DOT only ships CSV; we added columnar format for fast analytical queries
⏱ Skip ~3–4 hours of wrangling. At $100/hr that's $300–400 of your time — this dataset pays for itself in the first use.
What you'd need to do yourself ↓
- Find and download 22 separate annual zip files from the NJ DOT crash portal
- Parse and align columns that changed names, types, and coding schemes across years
- Merge all files without creating duplicate or mismatched records
- Join the 5 sub-tables yourself using the raw crash ID relationships
- Convert to Parquet (not provided by NJ DOT)
~6.3M
Crash Events
22
Years
21
Counties
31
Features
Use Cases
Train multi-class classifiers to predict fatal/injury/PDO outcomes from road, weather, and speed features.
Cluster crashes by lat/lon to identify high-risk corridors and intersections for infrastructure investment.
Build route-level risk features for actuarial scoring, telematics, and underwriting models.
Detect seasonal, weekly, and time-of-day crash patterns for traffic management and policy.
Real-world incident data for ADAS validation, simulation scenarios, and safety benchmarking.
Support infrastructure investment decisions, Vision Zero initiatives, and Complete Streets analysis.
Schema
Primary table: nj_crash_records.csv / nj_crash_records.parquet— 31 columns, one row per crash event.
| Field | Type | Description |
|---|---|---|
| crash_id | string | NJ DOT unique crash identifier |
| year | int | Data year (2001-2022) |
| date | string | Crash date (YYYY-MM-DD) |
| day_of_week | string | Day name |
| time | string | Time (HHMM 24hr) |
| county | string | NJ county |
| municipality | string | Municipality name |
| severity | string | fatal, injury, or pdo |
| vehicle_count | int | Vehicles involved |
| total_killed | int | Fatalities |
| total_injured | int | Injuries |
| casualty_count | int | Total casualties |
| pedestrians_killed | int | Pedestrian fatalities |
| pedestrians_injured | int | Pedestrian injuries |
| alcohol_involved | string | Y/N flag |
| hazmat_involved | string | Y/N flag |
| weather | string | Weather condition code |
| road_condition | string | Surface condition code |
| light_condition | string | Lighting code |
| posted_speed | int | Speed limit (mph) |
| latitude | float | Decimal degrees (2013+) |
| longitude | float | Decimal degrees (2013+) |
Quick Start
from datasets import load_dataset
# Load 1,000-row sample (free)
ds = load_dataset("claritystorm/nj-crash-records-2001-2022")
# Or with pandas
import pandas as pd
df = pd.read_csv(
"https://huggingface.co/datasets/claritystorm/"
"nj-crash-records-2001-2022/resolve/main/sample_1000.csv"
)
# Severity distribution
print(df["severity"].value_counts())Pricing
$99
All tables (Accidents, Vehicles, Drivers, Occupants, Pedestrians) — CSV + Parquet
Commercial License
Buy CompleteData Provenance
Source: New Jersey Department of Transportation (NJ DOT)
Portal: NJ DOT Crash Data Portal
License: Split licensing. Free sample (1,000 rows on Hugging Face): CC-BY 4.0. Paid tiers: ClarityStorm Commercial Data License — internal use only, no redistribution or resale of raw data. Derivative works (models, analysis, research papers) are permitted. NJ DOT crash data is factual government data collected under statutory duty and formally cleared for commercial resale.
Attribution: “NJ Crash Records 2001-2022, sourced from NJ DOT public crash data, processed by ClarityStorm Data.”
Need custom data cuts, API access, or bulk licensing?
Contact Sales