FDA FAERS Drug Adverse Events 2023

Name: FDA FAERS Drug Adverse Events 2023
Creator: ClarityStorm
License: https://www.claritystorm.com/license

FDA Adverse Event Reporting System — Cleaned & Deduplicated

The FDA Adverse Event Reporting System (FAERS) is the primary source for post-market drug safety surveillance in the United States. This dataset covers all of 2023 — 1.5M+ deduplicated adverse event reports across 7 relational tables, with normalised drug names, MedDRA reaction terms, and expanded outcome codes. Ideal for pharmacovigilance, signal detection, drug safety research, and AI/ML applications.

Sample: Public DomainPaid: Commercial LicenseCSV + Parquet7 Tables2023Premium Dataset

Why not just download it from FDA?

You can — it's public domain. Here's what we saved you:

✓ Merged and deduplicated all 4 quarterly zip releases into a single consistent dataset using FDA's case deduplication logic
✓ All 7 relational tables extracted (Demographics, Drugs, Reactions, Outcomes, Therapy Dates, Indications, Sources) with consistent join keys
✓ Added primary_suspect flag to the Drugs table — essential for pharmacovigilance analysis, requires parsing the raw drug_role field
✓ Normalized drug names and date fields — FDA's raw ASCII files have encoding quirks and inconsistent date representations
✓ Parquet output — FDA distributes pipe-delimited ASCII text files; we convert all 7 tables to columnar format

⏱ Skip ~4–6 hours of merging, deduplication, and format wrangling. FDA's deduplication logic alone requires careful reading of their technical documentation.

What you'd need to do yourself ↓

Download 4 quarterly zip archives from the FDA FAERS portal (Q1–Q4 2023)
Each zip contains 7 ASCII text files with pipe-delimited format and encoding quirks
Implement FDA's deduplication logic — documented in their technical specifications, non-trivial to apply correctly
Join 7 tables using the correct case/drug/report hierarchies (primaryid, caseid, isi_sadr)
Convert all files from ASCII pipe-delimited to a usable analytical format

1.5M+

Adverse Event Reports

7.4M+

Drug Entries

5.8M+

Reaction Terms

Tables

Use Cases

Pharmacovigilance & Signal Detection

Identify statistical signals for drug-adverse event associations using disproportionality analysis (PRR, ROR) across 1.5M+ reports. Build real-world safety monitoring pipelines.

Drug Safety Research

Analyse adverse event patterns by drug class, patient demographics, reporter type, and outcome severity. Supplement clinical trial data with real-world post-market evidence.

NLP on Adverse Event Narratives

7.4M+ drug name entries and 5.8M+ MedDRA reaction terms for named entity recognition, drug normalisation, and adverse event classification model training.

Clinical Trial Design

Use historical adverse event rates to inform safety monitoring plans, define endpoint thresholds, and identify high-risk patient subpopulations for new drug trials.

Insurance & Actuarial Risk

Model drug-related adverse event rates by indication, age group, and outcome severity for pharmaceutical liability and health insurance risk models.

Regulatory Intelligence

Track FDA reporting trends, manufacturer submission rates, and expedited vs. periodic reporting patterns. Benchmark drug safety profiles against the FAERS baseline.

Tables

7 relational tables, all joining on primaryid. Each table is delivered as both .csv and .parquet.

fda_faers_demo.parquet

1.5M+ rows

One row per deduplicated adverse event report. Demographics, reporter type, patient age, sex, weight.

fda_faers_drug.parquet

7.4M+ rows

One row per drug per report. Normalised drug names, active ingredient, drug role (primary/secondary suspect, concomitant).

fda_faers_reac.parquet

5.8M+ rows

One row per adverse reaction per report. MedDRA preferred terms, title-cased for consistency.

fda_faers_outc.parquet

1.2M+ rows

One row per serious outcome per report. Includes Death, Hospitalization, Life-Threatening, Disability, and more.

fda_faers_ther.parquet

2.6M+ rows

Drug therapy start and end dates per report. Duration and unit of measure.

fda_faers_indi.parquet

4.5M+ rows

Drug indication (reason for use) per drug per report. MedDRA preferred terms.

fda_faers_rpsr.parquet

52K+ rows

Report source per report. Manufacturer, health professional, consumer, foreign, literature, study.

Schema

Demographics Table (`fda_faers_demo`)

Primary table — one row per deduplicated adverse event report. Deduplication retains the latest case version per caseid.

Field	Type	Description
primaryid	string	Unique FAERS report identifier
caseid	string	Case ID — groups all versions of the same case
caseversion	int	Case version number (deduped: latest version kept per caseid)
fda_dt	string	FDA receipt date (YYYY-MM-DD)
rept_dt	string	Report date (YYYY-MM-DD)
init_fda_dt	string	Initial FDA receipt date (YYYY-MM-DD)
mfr_sndr	string	Manufacturer or sender name
age_years	float	Patient age in years (normalised from age + age_cod)
sex_label	string	Patient sex: Male / Female / Unknown
wt_kg	float	Patient weight in kg (normalised from wt + wt_cod)
reporter_type	string	Reporter occupation: Physician / Pharmacist / Consumer / etc.
report_type	string	Report type: Expedited / Periodic / Direct / Voluntary
occr_country	string	Country where event occurred
_quarter	string	Source data quarter (e.g. 2023Q1)

Drug Table (`fda_faers_drug`)

Field	Type	Description
primaryid	string	Links to demo table
caseid	string	Case ID
drug_seq	string	Drug sequence number within the case
drugname	string	Drug name as reported (uppercase normalised)
prod_ai	string	Active ingredient name (uppercase normalised)
drug_role	string	Drug role: Primary Suspect / Secondary Suspect / Concomitant / Interacting
dose_vbm	string	Dose, route, and frequency as reported
cum_dose_chr	string	Cumulative dose to first event
dechal	string	Dechallenge result (effect stopped on withdrawal)
rechallenge	string	Rechallenge result (effect recurred on re-exposure)
nda_num	string	FDA NDA/BLA number

Quick Start

import pandas as pd

demo = pd.read_parquet("fda_faers_demo.parquet")
drug = pd.read_parquet("fda_faers_drug.parquet")
reac = pd.read_parquet("fda_faers_reac.parquet")
outc = pd.read_parquet("fda_faers_outc.parquet")

# Age distribution of reporters
print(demo["age_years"].describe())

# Most-reported drugs (primary suspects only)
primary_drugs = drug[drug["drug_role"] == "Primary Suspect"]
print(primary_drugs["drugname"].value_counts().head(20))

# Most common adverse reactions
print(reac["pt"].value_counts().head(20))

# Outcomes breakdown
print(outc["outcome"].value_counts())

# Reports involving death
deaths = outc[outc["outc_cod"] == "DE"]["primaryid"]
fatal_reports = demo[demo["primaryid"].isin(deaths)]
print(f"{len(fatal_reports):,} reports with fatal outcomes")

Pairs Well With

External: openFDA Drug Label API

Join FAERS drug names against the FDA drug label database (openFDA) to map brand names to active ingredients, NDC codes, and approved indications for enriched signal analysis.

External: MedDRA Hierarchy

Map FAERS preferred terms (PT) up the MedDRA hierarchy to high-level group terms (HLGT) and system organ classes (SOC) for aggregate pharmacovigilance analysis.

Pricing

Sample

Free

1,000 rows (Demographics table, CSV) + schema docs

Public Domain

Download Sample

Complete

$149

All 7 tables — 1.5M+ reports (2023), CSV + Parquet

Commercial License

Buy Complete

Annual

$299/yr

Full dataset + quarterly updates as FDA publishes new reports

Commercial License

Data Provenance

Source: US Food and Drug Administration (FDA), Adverse Event Reporting System (FAERS)

Portal: FDA FAERS Quarterly Data Files

Coverage: Full year 2023 (Q1–Q4). Quarterly ZIP files parsed, concatenated, cleaned, and deduplicated.

Deduplication: FAERS cases can be re-reported across quarters. Deduplication retains the latest version of each case (by caseid + caseversion), removing ~8% of duplicate submissions.

Drug name normalisation: Drug names are uppercased, stripped of trailing punctuation, and standardised. Active ingredients (prod_ai) are similarly normalised.

Update frequency: FDA publishes new quarterly files approximately 3 months after each quarter ends. Annual subscribers receive quarterly refreshes.

License: FDA FAERS data is a US federal government work in the public domain. Paid tiers are licensed under the ClarityStorm Commercial Data License covering our pipeline and enrichment work (deduplication, normalisation, derived fields, Parquet conversion).

Need multi-year coverage, custom data cuts, or bulk licensing?

Contact Sales

FDA FAERS Drug Adverse Events 2023

Why not just download it from FDA?

Use Cases

Tables

Schema

Demographics Table (fda_faers_demo)

Drug Table (fda_faers_drug)

Quick Start

Pairs Well With

Pricing

Data Provenance

Demographics Table (`fda_faers_demo`)

Drug Table (`fda_faers_drug`)