NHTSA Vehicle Complaints 1995–Present
Consumer Vehicle Safety Complaint Database
Every consumer vehicle safety complaint filed with the NHTSA Office of Defects Investigation since 1995 — cleaned, structured, and enriched with parsed component hierarchies. 2.2 million records covering all makes, models, and model years with free-text complaint narratives.
~2.2M
Complaints
30+
Years
100+
Makes
500+
Components
Use Cases
Train NLP models on complaint narratives to identify emerging defect patterns before formal investigations are opened.
Build classifiers to predict which complaint clusters are likely to lead to NHTSA investigations or recalls.
Score vehicle makes and models by complaint volume, crash involvement, and component failure rates for actuarial models.
Track defect rates by manufacturer, model year, and component system to benchmark safety quality over time.
Analyze injury and death rates by vehicle type, component, and failure mode for public health or litigation research.
2.2M complaint narratives in free text — a rich corpus for named entity recognition, clustering, and topic modeling.
Schema
Single table: nhtsa_complaints.parquet / nhtsa_complaints.csv — 53 columns, one row per complaint. Component description split into 3 hierarchy levels for easier filtering.
| Field | Type | Description |
|---|---|---|
| cmplid | int | Sequential complaint ID |
| odino | int | NHTSA ODI number |
| mfr_name | string | Manufacturer name |
| make | string | Vehicle make |
| model | string | Vehicle model |
| model_year | int | Vehicle model year |
| fail_date | string | Failure date (YYYY-MM-DD) |
| fail_year | int | Failure year (derived) |
| date_added | string | Date added to ODI (YYYY-MM-DD) |
| crash | int | Crash involved (1/0) |
| fire | int | Fire involved (1/0) |
| injured | int | Number of persons injured |
| deaths | int | Number of deaths |
| comp_desc | string | Full component description |
| comp_system | string | System (e.g. FUEL SYSTEM) |
| comp_component | string | Component (e.g. DELIVERY) |
| comp_part | string | Part (e.g. FUEL PUMP) |
| state | string | US state (2-letter code) |
| miles | int | Mileage at time of failure |
| complaint_desc | string | Full complaint narrative (free text) |
Quick Start
import pandas as pd
df = pd.read_parquet("nhtsa_complaints.parquet")
# Complaints involving crashes by make
crash_df = df[df["crash"] == 1]
print(crash_df["make"].value_counts().head(10))
# Top components by complaint volume
print(df["comp_system"].value_counts().head(10))
# Filter by model year range
modern = df[df["model_year"] >= 2015]
print(f"{len(modern):,} complaints for 2015+ vehicles")Pricing
Data Provenance
Source: National Highway Traffic Safety Administration (NHTSA), Office of Defects Investigation (ODI)
Portal: NHTSA Vehicle Safety Complaints
Update frequency: Continuous — NHTSA updates the flat file regularly as new complaints are filed. Annual subscribers receive quarterly refreshes.
License: NHTSA complaint data is a US federal government work in the public domain. Paid tiers are licensed under the ClarityStorm Commercial Data License covering our pipeline and enrichment work (component parsing, date standardisation, PII removal).
Need custom data cuts, API access, or bulk licensing?
Contact Sales