The National Flood Insurance Program (NFIP) has paid out over $70 billion in claims since 1978. Every one of those claims — 2.7 million and counting — is now available as a cleaned, analysis-ready dataset. If you work in real estate risk, climate finance, insurance underwriting, or urban planning, this is the canonical dataset for understanding flood exposure across the United States.
In this tutorial we'll load the ClarityStorm FEMA NFIP dataset, profile the geographic distribution of flood claims, compute decade-over-decade cost trends, and identify the ZIP codes with the highest cumulative flood risk. All in under 60 lines of Python.
What's in the Dataset
- 2.7M+ paid flood insurance claims from 1978 to present
- Flood zone classifications (A, V, X, etc.) for each property
- Coverage amounts, building characteristics, and damage breakdowns
- ZIP-level geocoding and ClarityStorm-computed risk scores
- Available as both CSV and Parquet for fast analytical queries
Loading the Data
import pandas as pd
# Parquet loads ~10x faster than CSV
claims = pd.read_parquet("fema_flood_insurance_claims.parquet")
print(f"Total claims: {len(claims):,}")
print(f"Year range: {claims['year_of_loss'].min()} – {claims['year_of_loss'].max()}")
print(f"States: {claims['state'].nunique()}")
print(claims[["state", "county", "year_of_loss", "amount_paid_on_building_claim"]].head())Rising Flood Costs by Decade
Climate change, coastal development, and aging infrastructure have dramatically increased flood damage costs. Grouping claims by decade reveals the acceleration — the 2010s saw more total payouts than the previous three decades combined.
import matplotlib.pyplot as plt
claims["decade"] = (claims["year_of_loss"] // 10) * 10
decade_costs = claims.groupby("decade")["amount_paid_on_building_claim"].sum() / 1e9
plt.figure(figsize=(10, 5))
decade_costs.plot(kind="bar", color="#0ea5e9", edgecolor="white")
plt.title("Total NFIP Building Claims Paid by Decade ($B)", fontsize=14, fontweight="bold")
plt.ylabel("Billions ($)")
plt.xlabel("Decade")
plt.xticks(rotation=0)
plt.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.savefig("flood_costs_by_decade.png", dpi=150)Highest-Risk ZIP Codes
Some ZIP codes account for a disproportionate share of all flood claims. Aggregating by ZIP reveals the hotspots — many cluster along the Gulf Coast, South Florida, and the lower Mississippi River valley. These concentrations are critical for insurance pricing and real estate risk assessment.
# Top 20 ZIP codes by cumulative claims
zip_risk = (
claims.groupby("reported_zip_code")
.agg(
total_claims=("amount_paid_on_building_claim", "count"),
total_paid=("amount_paid_on_building_claim", "sum"),
avg_claim=("amount_paid_on_building_claim", "mean"),
)
.sort_values("total_claims", ascending=False)
.head(20)
)
zip_risk["total_paid_M"] = zip_risk["total_paid"] / 1e6
print(zip_risk[["total_claims", "total_paid_M", "avg_claim"]].to_string())What to Build Next
- Flood risk scoring model: predict expected annual loss per ZIP code using historical claim frequency and severity
- Climate trend analysis: correlate rising claim volumes with sea level rise, hurricane intensity, and urbanization data
- Real estate risk overlay: join with property listings to flag high-flood-risk addresses before purchase
- Insurance portfolio analysis: model NFIP exposure concentration and reinsurance triggers
- Repetitive loss identification: find properties with 2+ claims to quantify moral hazard and mitigation ROI
The free sample contains 1,000 rows. The complete dataset includes 2.7M+ NFIP claims with flood zones, damage breakdowns, and ZIP-level risk scores as CSV and Parquet with a commercial license.