← All Datasets

NHTSA Vehicle Complaints 1995–Present

Consumer Vehicle Safety Complaint Database

Every consumer vehicle safety complaint filed with the NHTSA Office of Defects Investigation since 1995 — cleaned, structured, and enriched with parsed component hierarchies. 2.2 million records covering all makes, models, and model years with free-text complaint narratives.

Sample: Public DomainPaid: Commercial LicenseCSV + Parquet~2.2M records1995–Present

~2.2M

Complaints

30+

Years

100+

Makes

500+

Components

Use Cases

Vehicle Defect Detection

Train NLP models on complaint narratives to identify emerging defect patterns before formal investigations are opened.

Recall Prediction

Build classifiers to predict which complaint clusters are likely to lead to NHTSA investigations or recalls.

Insurance Risk Scoring

Score vehicle makes and models by complaint volume, crash involvement, and component failure rates for actuarial models.

Automotive Quality Analysis

Track defect rates by manufacturer, model year, and component system to benchmark safety quality over time.

Consumer Safety Research

Analyze injury and death rates by vehicle type, component, and failure mode for public health or litigation research.

NLP & Text Mining

2.2M complaint narratives in free text — a rich corpus for named entity recognition, clustering, and topic modeling.

Schema

Single table: nhtsa_complaints.parquet / nhtsa_complaints.csv — 53 columns, one row per complaint. Component description split into 3 hierarchy levels for easier filtering.

FieldTypeDescription
cmplidintSequential complaint ID
odinointNHTSA ODI number
mfr_namestringManufacturer name
makestringVehicle make
modelstringVehicle model
model_yearintVehicle model year
fail_datestringFailure date (YYYY-MM-DD)
fail_yearintFailure year (derived)
date_addedstringDate added to ODI (YYYY-MM-DD)
crashintCrash involved (1/0)
fireintFire involved (1/0)
injuredintNumber of persons injured
deathsintNumber of deaths
comp_descstringFull component description
comp_systemstringSystem (e.g. FUEL SYSTEM)
comp_componentstringComponent (e.g. DELIVERY)
comp_partstringPart (e.g. FUEL PUMP)
statestringUS state (2-letter code)
milesintMileage at time of failure
complaint_descstringFull complaint narrative (free text)

Quick Start

import pandas as pd

df = pd.read_parquet("nhtsa_complaints.parquet")

# Complaints involving crashes by make
crash_df = df[df["crash"] == 1]
print(crash_df["make"].value_counts().head(10))

# Top components by complaint volume
print(df["comp_system"].value_counts().head(10))

# Filter by model year range
modern = df[df["model_year"] >= 2015]
print(f"{len(modern):,} complaints for 2015+ vehicles")

Pricing

Sample

Free

1,000 rows (CSV) + schema docs

Public Domain

Download Sample
Complete

$79

Full dataset — 2.2M complaints, CSV + Parquet

Commercial License

Buy Complete
Annual

$149/yr

Full dataset + quarterly updates as NHTSA adds new complaints

Commercial License

Subscribe

Data Provenance

Source: National Highway Traffic Safety Administration (NHTSA), Office of Defects Investigation (ODI)

Portal: NHTSA Vehicle Safety Complaints

Update frequency: Continuous — NHTSA updates the flat file regularly as new complaints are filed. Annual subscribers receive quarterly refreshes.

License: NHTSA complaint data is a US federal government work in the public domain. Paid tiers are licensed under the ClarityStorm Commercial Data License covering our pipeline and enrichment work (component parsing, date standardisation, PII removal).

Need custom data cuts, API access, or bulk licensing?

Contact Sales