← All Datasets

NHTSA Vehicle Complaints 1995–Present

Consumer Vehicle Safety Complaint Database

Every consumer vehicle safety complaint filed with the NHTSA Office of Defects Investigation since 1995 — cleaned, structured, and enriched with parsed component hierarchies. 2.2 million records covering all makes, models, and model years with free-text complaint narratives.

Sample: Public DomainPaid: Commercial LicenseCSV + Parquet~2.2M records1995–Present

Why not just download it from NHTSA?

You can — it's public domain. Here's what we saved you:

  • Parsed component descriptions into 3 structured hierarchy levels (comp_system, comp_component, comp_part) — the raw source is one concatenated string
  • Standardized all date fields to ISO 8601 — NHTSA's source uses inconsistent date formats
  • Normalized make/model text for consistent filtering across 30 years of free-text manufacturer entries
  • Added derived fieldsfail_year extracted from date for easy time-series grouping
  • Pairs with Recalls — cleaned to a common make/model/year schema for joining complaint volume to recall campaigns

⏱ Skip ~2–3 hours of parsing and normalization. The component field alone requires custom parsing logic.

What you'd need to do yourself ↓
  • Download the raw flat file from NHTSA (a large pipe-delimited text file with encoding quirks)
  • Parse the COMPDESC field — a concatenated hierarchical string — into usable categorical columns
  • Handle encoding issues and special characters in the 2.2M complaint narratives
  • Standardize date fields that changed format over 30 years of data collection
  • Build your own Parquet conversion pipeline

~2.2M

Complaints

30+

Years

100+

Makes

500+

Components

Use Cases

Vehicle Defect Detection

Train NLP models on complaint narratives to identify emerging defect patterns before formal investigations are opened.

Recall Prediction

Build classifiers to predict which complaint clusters are likely to lead to NHTSA investigations or recalls.

Insurance Risk Scoring

Score vehicle makes and models by complaint volume, crash involvement, and component failure rates for actuarial models.

Automotive Quality Analysis

Track defect rates by manufacturer, model year, and component system to benchmark safety quality over time.

Consumer Safety Research

Analyze injury and death rates by vehicle type, component, and failure mode for public health or litigation research.

NLP & Text Mining

2.2M complaint narratives in free text — a rich corpus for named entity recognition, clustering, and topic modeling.

Schema

Single table: nhtsa_complaints.parquet / nhtsa_complaints.csv — 53 columns, one row per complaint. Component description split into 3 hierarchy levels for easier filtering.

FieldTypeDescription
cmplidintSequential complaint ID
odinointNHTSA ODI number
mfr_namestringManufacturer name
makestringVehicle make
modelstringVehicle model
model_yearintVehicle model year
fail_datestringFailure date (YYYY-MM-DD)
fail_yearintFailure year (derived)
date_addedstringDate added to ODI (YYYY-MM-DD)
crashintCrash involved (1/0)
fireintFire involved (1/0)
injuredintNumber of persons injured
deathsintNumber of deaths
comp_descstringFull component description
comp_systemstringSystem (e.g. FUEL SYSTEM)
comp_componentstringComponent (e.g. DELIVERY)
comp_partstringPart (e.g. FUEL PUMP)
statestringUS state (2-letter code)
milesintMileage at time of failure
complaint_descstringFull complaint narrative (free text)

Quick Start

import pandas as pd

df = pd.read_parquet("nhtsa_complaints.parquet")

# Complaints involving crashes by make
crash_df = df[df["crash"] == 1]
print(crash_df["make"].value_counts().head(10))

# Top components by complaint volume
print(df["comp_system"].value_counts().head(10))

# Filter by model year range
modern = df[df["model_year"] >= 2015]
print(f"{len(modern):,} complaints for 2015+ vehicles")

Pricing

Sample

Free

1,000 rows (CSV) + schema docs

Public Domain

Download Sample
Complete

$79

Full dataset — 2.2M complaints, CSV + Parquet

Commercial License

Buy Complete
Annual

$149/yr

Full dataset + quarterly updates as NHTSA adds new complaints

Commercial License

Subscribe

Data Provenance

Source: National Highway Traffic Safety Administration (NHTSA), Office of Defects Investigation (ODI)

Portal: NHTSA Vehicle Safety Complaints

Update frequency: Continuous — NHTSA updates the flat file regularly as new complaints are filed. Annual subscribers receive quarterly refreshes.

License: NHTSA complaint data is a US federal government work in the public domain. Paid tiers are licensed under the ClarityStorm Commercial Data License covering our pipeline and enrichment work (component parsing, date standardisation, PII removal).

Need custom data cuts, API access, or bulk licensing?

Contact Sales