FDA FAERS Drug Adverse Events 2023
FDA Adverse Event Reporting System — Cleaned & Deduplicated
The FDA Adverse Event Reporting System (FAERS) is the primary source for post-market drug safety surveillance in the United States. This dataset covers all of 2023 — 1.5M+ deduplicated adverse event reports across 7 relational tables, with normalised drug names, MedDRA reaction terms, and expanded outcome codes. Ideal for pharmacovigilance, signal detection, drug safety research, and AI/ML applications.
Why not just download it from FDA?
You can — it's public domain. Here's what we saved you:
- ✓ Merged and deduplicated all 4 quarterly zip releases into a single consistent dataset using FDA's case deduplication logic
- ✓ All 7 relational tables extracted (Demographics, Drugs, Reactions, Outcomes, Therapy Dates, Indications, Sources) with consistent join keys
- ✓ Added
primary_suspectflag to the Drugs table — essential for pharmacovigilance analysis, requires parsing the rawdrug_rolefield - ✓ Normalized drug names and date fields — FDA's raw ASCII files have encoding quirks and inconsistent date representations
- ✓ Parquet output — FDA distributes pipe-delimited ASCII text files; we convert all 7 tables to columnar format
⏱ Skip ~4–6 hours of merging, deduplication, and format wrangling. FDA's deduplication logic alone requires careful reading of their technical documentation.
What you'd need to do yourself ↓
- Download 4 quarterly zip archives from the FDA FAERS portal (Q1–Q4 2023)
- Each zip contains 7 ASCII text files with pipe-delimited format and encoding quirks
- Implement FDA's deduplication logic — documented in their technical specifications, non-trivial to apply correctly
- Join 7 tables using the correct case/drug/report hierarchies (
primaryid,caseid,isi_sadr) - Convert all files from ASCII pipe-delimited to a usable analytical format
1.5M+
Adverse Event Reports
7.4M+
Drug Entries
5.8M+
Reaction Terms
7
Tables
Use Cases
Identify statistical signals for drug-adverse event associations using disproportionality analysis (PRR, ROR) across 1.5M+ reports. Build real-world safety monitoring pipelines.
Analyse adverse event patterns by drug class, patient demographics, reporter type, and outcome severity. Supplement clinical trial data with real-world post-market evidence.
7.4M+ drug name entries and 5.8M+ MedDRA reaction terms for named entity recognition, drug normalisation, and adverse event classification model training.
Use historical adverse event rates to inform safety monitoring plans, define endpoint thresholds, and identify high-risk patient subpopulations for new drug trials.
Model drug-related adverse event rates by indication, age group, and outcome severity for pharmaceutical liability and health insurance risk models.
Track FDA reporting trends, manufacturer submission rates, and expedited vs. periodic reporting patterns. Benchmark drug safety profiles against the FAERS baseline.
Tables
7 relational tables, all joining on primaryid. Each table is delivered as both .csv and .parquet.
1.5M+ rows
One row per deduplicated adverse event report. Demographics, reporter type, patient age, sex, weight.
7.4M+ rows
One row per drug per report. Normalised drug names, active ingredient, drug role (primary/secondary suspect, concomitant).
5.8M+ rows
One row per adverse reaction per report. MedDRA preferred terms, title-cased for consistency.
1.2M+ rows
One row per serious outcome per report. Includes Death, Hospitalization, Life-Threatening, Disability, and more.
2.6M+ rows
Drug therapy start and end dates per report. Duration and unit of measure.
4.5M+ rows
Drug indication (reason for use) per drug per report. MedDRA preferred terms.
52K+ rows
Report source per report. Manufacturer, health professional, consumer, foreign, literature, study.
Schema
Demographics Table (fda_faers_demo)
Primary table — one row per deduplicated adverse event report. Deduplication retains the latest case version per caseid.
| Field | Type | Description |
|---|---|---|
| primaryid | string | Unique FAERS report identifier |
| caseid | string | Case ID — groups all versions of the same case |
| caseversion | int | Case version number (deduped: latest version kept per caseid) |
| fda_dt | string | FDA receipt date (YYYY-MM-DD) |
| rept_dt | string | Report date (YYYY-MM-DD) |
| init_fda_dt | string | Initial FDA receipt date (YYYY-MM-DD) |
| mfr_sndr | string | Manufacturer or sender name |
| age_years | float | Patient age in years (normalised from age + age_cod) |
| sex_label | string | Patient sex: Male / Female / Unknown |
| wt_kg | float | Patient weight in kg (normalised from wt + wt_cod) |
| reporter_type | string | Reporter occupation: Physician / Pharmacist / Consumer / etc. |
| report_type | string | Report type: Expedited / Periodic / Direct / Voluntary |
| occr_country | string | Country where event occurred |
| _quarter | string | Source data quarter (e.g. 2023Q1) |
Drug Table (fda_faers_drug)
| Field | Type | Description |
|---|---|---|
| primaryid | string | Links to demo table |
| caseid | string | Case ID |
| drug_seq | string | Drug sequence number within the case |
| drugname | string | Drug name as reported (uppercase normalised) |
| prod_ai | string | Active ingredient name (uppercase normalised) |
| drug_role | string | Drug role: Primary Suspect / Secondary Suspect / Concomitant / Interacting |
| dose_vbm | string | Dose, route, and frequency as reported |
| cum_dose_chr | string | Cumulative dose to first event |
| dechal | string | Dechallenge result (effect stopped on withdrawal) |
| rechallenge | string | Rechallenge result (effect recurred on re-exposure) |
| nda_num | string | FDA NDA/BLA number |
Quick Start
import pandas as pd
demo = pd.read_parquet("fda_faers_demo.parquet")
drug = pd.read_parquet("fda_faers_drug.parquet")
reac = pd.read_parquet("fda_faers_reac.parquet")
outc = pd.read_parquet("fda_faers_outc.parquet")
# Age distribution of reporters
print(demo["age_years"].describe())
# Most-reported drugs (primary suspects only)
primary_drugs = drug[drug["drug_role"] == "Primary Suspect"]
print(primary_drugs["drugname"].value_counts().head(20))
# Most common adverse reactions
print(reac["pt"].value_counts().head(20))
# Outcomes breakdown
print(outc["outcome"].value_counts())
# Reports involving death
deaths = outc[outc["outc_cod"] == "DE"]["primaryid"]
fatal_reports = demo[demo["primaryid"].isin(deaths)]
print(f"{len(fatal_reports):,} reports with fatal outcomes")Pairs Well With
Join FAERS drug names against the FDA drug label database (openFDA) to map brand names to active ingredients, NDC codes, and approved indications for enriched signal analysis.
Map FAERS preferred terms (PT) up the MedDRA hierarchy to high-level group terms (HLGT) and system organ classes (SOC) for aggregate pharmacovigilance analysis.
Pricing
Data Provenance
Source: US Food and Drug Administration (FDA), Adverse Event Reporting System (FAERS)
Portal: FDA FAERS Quarterly Data Files
Coverage: Full year 2023 (Q1–Q4). Quarterly ZIP files parsed, concatenated, cleaned, and deduplicated.
Deduplication: FAERS cases can be re-reported across quarters. Deduplication retains the latest version of each case (by caseid + caseversion), removing ~8% of duplicate submissions.
Drug name normalisation: Drug names are uppercased, stripped of trailing punctuation, and standardised. Active ingredients (prod_ai) are similarly normalised.
Update frequency: FDA publishes new quarterly files approximately 3 months after each quarter ends. Annual subscribers receive quarterly refreshes.
License: FDA FAERS data is a US federal government work in the public domain. Paid tiers are licensed under the ClarityStorm Commercial Data License covering our pipeline and enrichment work (deduplication, normalisation, derived fields, Parquet conversion).
Need multi-year coverage, custom data cuts, or bulk licensing?
Contact Sales