Analyzing 46 Years of US Flood Risk with FEMA NFIP Claims Data

The National Flood Insurance Program (NFIP) has paid out over $70 billion in claims since 1978. Every one of those claims — 2.7 million and counting — is now available as a cleaned, analysis-ready dataset. If you work in real estate risk, climate finance, insurance underwriting, or urban planning, this is the canonical dataset for understanding flood exposure across the United States.

In this tutorial we'll load the ClarityStorm FEMA NFIP dataset, profile the geographic distribution of flood claims, compute decade-over-decade cost trends, and identify the ZIP codes with the highest cumulative flood risk. All in under 60 lines of Python.

What's in the Dataset

2.7M+ paid flood insurance claims from 1978 to present
Flood zone classifications (A, V, X, etc.) for each property
Coverage amounts, building characteristics, and damage breakdowns
ZIP-level geocoding and ClarityStorm-computed risk scores
Available as both CSV and Parquet for fast analytical queries

Loading the Data

python

import pandas as pd

# Parquet loads ~10x faster than CSV
claims = pd.read_parquet("fema_flood_insurance_claims.parquet")

print(f"Total claims: {len(claims):,}")
print(f"Year range: {claims['year_of_loss'].min()} – {claims['year_of_loss'].max()}")
print(f"States: {claims['state'].nunique()}")
print(claims[["state", "county", "year_of_loss", "amount_paid_on_building_claim"]].head())

Rising Flood Costs by Decade

Climate change, coastal development, and aging infrastructure have dramatically increased flood damage costs. Grouping claims by decade reveals the acceleration — the 2010s saw more total payouts than the previous three decades combined.

python

import matplotlib.pyplot as plt

claims["decade"] = (claims["year_of_loss"] // 10) * 10
decade_costs = claims.groupby("decade")["amount_paid_on_building_claim"].sum() / 1e9

plt.figure(figsize=(10, 5))
decade_costs.plot(kind="bar", color="#0ea5e9", edgecolor="white")
plt.title("Total NFIP Building Claims Paid by Decade ($B)", fontsize=14, fontweight="bold")
plt.ylabel("Billions ($)")
plt.xlabel("Decade")
plt.xticks(rotation=0)
plt.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.savefig("flood_costs_by_decade.png", dpi=150)

Highest-Risk ZIP Codes

Some ZIP codes account for a disproportionate share of all flood claims. Aggregating by ZIP reveals the hotspots — many cluster along the Gulf Coast, South Florida, and the lower Mississippi River valley. These concentrations are critical for insurance pricing and real estate risk assessment.

python

# Top 20 ZIP codes by cumulative claims
zip_risk = (
    claims.groupby("reported_zip_code")
    .agg(
        total_claims=("amount_paid_on_building_claim", "count"),
        total_paid=("amount_paid_on_building_claim", "sum"),
        avg_claim=("amount_paid_on_building_claim", "mean"),
    )
    .sort_values("total_claims", ascending=False)
    .head(20)
)

zip_risk["total_paid_M"] = zip_risk["total_paid"] / 1e6
print(zip_risk[["total_claims", "total_paid_M", "avg_claim"]].to_string())

What to Build Next

Flood risk scoring model: predict expected annual loss per ZIP code using historical claim frequency and severity
Climate trend analysis: correlate rising claim volumes with sea level rise, hurricane intensity, and urbanization data
Real estate risk overlay: join with property listings to flag high-flood-risk addresses before purchase
Insurance portfolio analysis: model NFIP exposure concentration and reinsurance triggers
Repetitive loss identification: find properties with 2+ claims to quantify moral hazard and mitigation ROI

The free sample contains 1,000 rows. The complete dataset includes 2.7M+ NFIP claims with flood zones, damage breakdowns, and ZIP-level risk scores as CSV and Parquet with a commercial license.

What's in the Dataset

Loading the Data

Rising Flood Costs by Decade

Highest-Risk ZIP Codes

What to Build Next

Get the Full Dataset