Since 1987, US industrial facilities have been required by law to report their toxic chemical releases to the EPA's Toxics Release Inventory (TRI). Every year, roughly 20,000 facilities — manufacturers, power plants, metal smelters, chemical companies — file detailed reports on how many pounds of each toxic chemical they released to air, water, and land. The result is one of the most important environmental datasets ever created: 37 years of facility-level chemical release data covering 650+ regulated chemicals.
In this tutorial we'll load the ClarityStorm EPA TRI dataset, analyze national release trends, identify the highest-emitting facilities, build a PFAS exposure index by county, and create an interactive map.
Two Tables: Releases and Facilities
The ClarityStorm TRI release ships two tables. The releases table is the core — one row per facility-year-chemical combination, with pounds released to each medium (air, water, land) and total transfer quantities. The facilities table provides geographic detail (lat/lon, NAICS code, parent company) for joining and mapping.
- 3M+ release records (releases table), 1987–present
- 650+ regulated chemicals with CAS numbers
- Carcinogen, PBT (persistent bioaccumulative toxic), and PFAS flags
- Release breakdown: air fugitive, air stack, surface water, underground injection, land
- 20K+ unique reporting facilities with lat/lon (facilities table)
Loading the Data
import pandas as pd
releases = pd.read_parquet("tri_releases.parquet")
facilities = pd.read_parquet("tri_facilities.parquet")
print(f"Release records: {len(releases):,}")
print(f"Unique chemicals: {releases['chemical_name'].nunique():,}")
print(f"Year range: {releases['year'].min()} – {releases['year'].max()}")
print(f"Facilities: {len(facilities):,}")
# Total releases by medium
mediums = ["total_air", "total_water", "total_land"]
print(releases[mediums].sum() / 1e9) # billions of poundsNational Release Trends
Total TRI releases have declined significantly since the program's inception — a success story of regulatory pressure and industrial efficiency improvements. But not all chemicals or media follow the same trajectory. Air emissions declined fastest. Some chemicals show alarming recent upticks. The 37-year time series makes these trends visible.
import matplotlib.pyplot as plt
# Annual total releases by medium
annual = (
releases.groupby("year")[["total_air", "total_water", "total_land"]]
.sum()
.reset_index()
)
fig, ax = plt.subplots(figsize=(12, 5))
colors = {"total_air": "#0ea5e9", "total_water": "#22c55e", "total_land": "#f59e0b"}
for col, color in colors.items():
ax.plot(annual["year"], annual[col] / 1e6, label=col.replace("total_", "").title(),
color=color, linewidth=2)
ax.set_title("US TRI Toxic Releases 1987–Present (millions lbs)", fontsize=13)
ax.set_xlabel("Year")
ax.set_ylabel("Millions of Pounds Released")
ax.legend()
ax.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.savefig("tri_trends.png", dpi=150)PFAS Analysis
PFAS — per- and polyfluoroalkyl substances, the 'forever chemicals' — became a major EPA focus after 2020. The TRI dataset flags PFAS chemicals explicitly. Analyzing PFAS releases by state and facility reveals which industries and geographies are the primary contributors to PFAS contamination, directly actionable for environmental justice research and regulatory advocacy.
# PFAS releases by state, 2020–present
pfas = releases[(releases["pfas"] == 1) & (releases["year"] >= 2020)]
pfas_by_state = (
pfas.groupby("state")["total_release"]
.sum()
.sort_values(ascending=False)
.head(15)
.reset_index()
)
pfas_by_state["total_lbs"] = pfas_by_state["total_release"].map(lambda x: f"{x:,.0f}")
print(pfas_by_state[["state", "total_lbs"]].to_string(index=False))
# Top PFAS-emitting facilities
top_pfas_facilities = (
pfas.merge(facilities[["trifid", "facility_name", "latitude", "longitude"]], on="trifid")
.groupby(["trifid", "facility_name", "state", "latitude", "longitude"])["total_release"]
.sum()
.sort_values(ascending=False)
.head(20)
.reset_index()
)
print(top_pfas_facilities[["facility_name", "state", "total_release"]].head(10))Interactive Facility Map
The facilities table provides lat/lon coordinates for every reporting facility. Combined with total carcinogen releases, this enables an interactive map where each facility's bubble is sized by emission volume — a powerful tool for environmental journalism, ESG screening, or regulatory research.
import folium
import numpy as np
# Carcinogen releases by facility, most recent 5 years
recent = releases[releases["year"] >= releases["year"].max() - 5]
carc = (
recent[recent["carcinogen"] == 1]
.groupby("trifid")["total_release"]
.sum()
.reset_index()
)
carc = carc.merge(
facilities[["trifid", "facility_name", "latitude", "longitude", "state"]],
on="trifid"
).dropna(subset=["latitude", "longitude"])
# Build bubble map
m = folium.Map(location=[39.5, -98.35], zoom_start=4, tiles="CartoDB positron")
for _, row in carc.iterrows():
radius = max(3, np.log1p(row["total_release"]) * 1.5)
folium.CircleMarker(
location=[row["latitude"], row["longitude"]],
radius=radius,
color="#ef4444",
fill=True,
fill_opacity=0.5,
popup=f"{row['facility_name']} ({row['state']}): {row['total_release']:,.0f} lbs",
).add_to(m)
m.save("tri_carcinogen_map.html")
print("Map saved to tri_carcinogen_map.html")The free sample contains 1,000 rows from the releases table. The complete dataset covers 3M+ release records across 37 years and both tables, available as CSV and Parquet.