← All Datasets

FDA FAERS Drug Adverse Events 2023

FDA Adverse Event Reporting System — Cleaned & Deduplicated

The FDA Adverse Event Reporting System (FAERS) is the primary source for post-market drug safety surveillance in the United States. This dataset covers all of 2023 — 1.5M+ deduplicated adverse event reports across 7 relational tables, with normalised drug names, MedDRA reaction terms, and expanded outcome codes. Ideal for pharmacovigilance, signal detection, drug safety research, and AI/ML applications.

Sample: Public DomainPaid: Commercial LicenseCSV + Parquet7 Tables2023Premium Dataset

Why not just download it from FDA?

You can — it's public domain. Here's what we saved you:

  • Merged and deduplicated all 4 quarterly zip releases into a single consistent dataset using FDA's case deduplication logic
  • All 7 relational tables extracted (Demographics, Drugs, Reactions, Outcomes, Therapy Dates, Indications, Sources) with consistent join keys
  • Added primary_suspect flag to the Drugs table — essential for pharmacovigilance analysis, requires parsing the raw drug_role field
  • Normalized drug names and date fields — FDA's raw ASCII files have encoding quirks and inconsistent date representations
  • Parquet output — FDA distributes pipe-delimited ASCII text files; we convert all 7 tables to columnar format

⏱ Skip ~4–6 hours of merging, deduplication, and format wrangling. FDA's deduplication logic alone requires careful reading of their technical documentation.

What you'd need to do yourself ↓
  • Download 4 quarterly zip archives from the FDA FAERS portal (Q1–Q4 2023)
  • Each zip contains 7 ASCII text files with pipe-delimited format and encoding quirks
  • Implement FDA's deduplication logic — documented in their technical specifications, non-trivial to apply correctly
  • Join 7 tables using the correct case/drug/report hierarchies (primaryid, caseid, isi_sadr)
  • Convert all files from ASCII pipe-delimited to a usable analytical format

1.5M+

Adverse Event Reports

7.4M+

Drug Entries

5.8M+

Reaction Terms

7

Tables

Use Cases

Pharmacovigilance & Signal Detection

Identify statistical signals for drug-adverse event associations using disproportionality analysis (PRR, ROR) across 1.5M+ reports. Build real-world safety monitoring pipelines.

Drug Safety Research

Analyse adverse event patterns by drug class, patient demographics, reporter type, and outcome severity. Supplement clinical trial data with real-world post-market evidence.

NLP on Adverse Event Narratives

7.4M+ drug name entries and 5.8M+ MedDRA reaction terms for named entity recognition, drug normalisation, and adverse event classification model training.

Clinical Trial Design

Use historical adverse event rates to inform safety monitoring plans, define endpoint thresholds, and identify high-risk patient subpopulations for new drug trials.

Insurance & Actuarial Risk

Model drug-related adverse event rates by indication, age group, and outcome severity for pharmaceutical liability and health insurance risk models.

Regulatory Intelligence

Track FDA reporting trends, manufacturer submission rates, and expedited vs. periodic reporting patterns. Benchmark drug safety profiles against the FAERS baseline.

Tables

7 relational tables, all joining on primaryid. Each table is delivered as both .csv and .parquet.

fda_faers_demo.parquet

1.5M+ rows

One row per deduplicated adverse event report. Demographics, reporter type, patient age, sex, weight.

fda_faers_drug.parquet

7.4M+ rows

One row per drug per report. Normalised drug names, active ingredient, drug role (primary/secondary suspect, concomitant).

fda_faers_reac.parquet

5.8M+ rows

One row per adverse reaction per report. MedDRA preferred terms, title-cased for consistency.

fda_faers_outc.parquet

1.2M+ rows

One row per serious outcome per report. Includes Death, Hospitalization, Life-Threatening, Disability, and more.

fda_faers_ther.parquet

2.6M+ rows

Drug therapy start and end dates per report. Duration and unit of measure.

fda_faers_indi.parquet

4.5M+ rows

Drug indication (reason for use) per drug per report. MedDRA preferred terms.

fda_faers_rpsr.parquet

52K+ rows

Report source per report. Manufacturer, health professional, consumer, foreign, literature, study.

Schema

Demographics Table (fda_faers_demo)

Primary table — one row per deduplicated adverse event report. Deduplication retains the latest case version per caseid.

FieldTypeDescription
primaryidstringUnique FAERS report identifier
caseidstringCase ID — groups all versions of the same case
caseversionintCase version number (deduped: latest version kept per caseid)
fda_dtstringFDA receipt date (YYYY-MM-DD)
rept_dtstringReport date (YYYY-MM-DD)
init_fda_dtstringInitial FDA receipt date (YYYY-MM-DD)
mfr_sndrstringManufacturer or sender name
age_yearsfloatPatient age in years (normalised from age + age_cod)
sex_labelstringPatient sex: Male / Female / Unknown
wt_kgfloatPatient weight in kg (normalised from wt + wt_cod)
reporter_typestringReporter occupation: Physician / Pharmacist / Consumer / etc.
report_typestringReport type: Expedited / Periodic / Direct / Voluntary
occr_countrystringCountry where event occurred
_quarterstringSource data quarter (e.g. 2023Q1)

Drug Table (fda_faers_drug)

FieldTypeDescription
primaryidstringLinks to demo table
caseidstringCase ID
drug_seqstringDrug sequence number within the case
drugnamestringDrug name as reported (uppercase normalised)
prod_aistringActive ingredient name (uppercase normalised)
drug_rolestringDrug role: Primary Suspect / Secondary Suspect / Concomitant / Interacting
dose_vbmstringDose, route, and frequency as reported
cum_dose_chrstringCumulative dose to first event
dechalstringDechallenge result (effect stopped on withdrawal)
rechallengestringRechallenge result (effect recurred on re-exposure)
nda_numstringFDA NDA/BLA number

Quick Start

import pandas as pd

demo = pd.read_parquet("fda_faers_demo.parquet")
drug = pd.read_parquet("fda_faers_drug.parquet")
reac = pd.read_parquet("fda_faers_reac.parquet")
outc = pd.read_parquet("fda_faers_outc.parquet")

# Age distribution of reporters
print(demo["age_years"].describe())

# Most-reported drugs (primary suspects only)
primary_drugs = drug[drug["drug_role"] == "Primary Suspect"]
print(primary_drugs["drugname"].value_counts().head(20))

# Most common adverse reactions
print(reac["pt"].value_counts().head(20))

# Outcomes breakdown
print(outc["outcome"].value_counts())

# Reports involving death
deaths = outc[outc["outc_cod"] == "DE"]["primaryid"]
fatal_reports = demo[demo["primaryid"].isin(deaths)]
print(f"{len(fatal_reports):,} reports with fatal outcomes")

Pairs Well With

External: openFDA Drug Label API

Join FAERS drug names against the FDA drug label database (openFDA) to map brand names to active ingredients, NDC codes, and approved indications for enriched signal analysis.

External: MedDRA Hierarchy

Map FAERS preferred terms (PT) up the MedDRA hierarchy to high-level group terms (HLGT) and system organ classes (SOC) for aggregate pharmacovigilance analysis.

Pricing

Sample

Free

1,000 rows (Demographics table, CSV) + schema docs

Public Domain

Download Sample
Complete

$149

All 7 tables — 1.5M+ reports (2023), CSV + Parquet

Commercial License

Buy Complete
Annual

$299/yr

Full dataset + quarterly updates as FDA publishes new reports

Commercial License

Subscribe

Data Provenance

Source: US Food and Drug Administration (FDA), Adverse Event Reporting System (FAERS)

Portal: FDA FAERS Quarterly Data Files

Coverage: Full year 2023 (Q1–Q4). Quarterly ZIP files parsed, concatenated, cleaned, and deduplicated.

Deduplication: FAERS cases can be re-reported across quarters. Deduplication retains the latest version of each case (by caseid + caseversion), removing ~8% of duplicate submissions.

Drug name normalisation: Drug names are uppercased, stripped of trailing punctuation, and standardised. Active ingredients (prod_ai) are similarly normalised.

Update frequency: FDA publishes new quarterly files approximately 3 months after each quarter ends. Annual subscribers receive quarterly refreshes.

License: FDA FAERS data is a US federal government work in the public domain. Paid tiers are licensed under the ClarityStorm Commercial Data License covering our pipeline and enrichment work (deduplication, normalisation, derived fields, Parquet conversion).

Need multi-year coverage, custom data cuts, or bulk licensing?

Contact Sales