Birds and aircraft have been colliding since the Wright Brothers took flight, but it wasn't until 1990 that the FAA began systematically tracking wildlife strikes in a national database. Since then, the Wildlife Strike Database has grown to over 341,000 reports — covering every collision between a civil aircraft and a bird, mammal, or reptile reported to the FAA. For aviation safety researchers, airport wildlife managers, and AI/ML practitioners building risk models, this is one of the richest and most consistently structured public safety datasets available.
In this tutorial we'll load the ClarityStorm FAA Wildlife Strikes dataset, analyze strike frequency trends, identify the riskiest airports and species, and dig into the economics of engine ingestions. Everything runs in Python with pandas and matplotlib.
What's in the Dataset
The ClarityStorm FAA Wildlife Strikes release covers 341,090 strike events from 1990 through 2025. Each row captures the incident date, airport, phase of flight, species identified, aircraft type, damage level, repair cost, and whether engines were ingested. The pipeline adds several enriched columns beyond the raw FAA export: a damage_severity_tier, species_risk_score, season, faa_region_name, and a cost_bucket field for fast bucketing in ML pipelines.
- 341,090 strike events (1990–2025)
- Species identification for ~65% of strikes — 500+ distinct species in the data
- Repair cost fields (raw USD and inflation-adjusted) for events with damage
- Phase of flight for every event: approach, landing roll, take-off run, climb, en route
- Engine ingestion flag, aircraft out-of-service hours, and damage severity tier
- Latitude/longitude for airport-level geospatial analysis
Loading the Data
The Parquet file loads in under a second and preserves correct dtypes for numeric and boolean columns. The free 1,000-row sample CSV is enough to validate your pipeline before purchasing the full release.
import pandas as pd
df = pd.read_parquet("faa_wildlife_strikes.parquet")
print(f"Records: {len(df):,}")
print(f"Year range: {df['incident_year'].min()} – {df['incident_year'].max()}")
print(df[["incident_date","airport","species","damage_label","phase_of_flight"]].head(5))Strike Frequency Is Rising Fast
The annual strike count has nearly doubled over the last decade, from 13,776 in 2015 to 22,372 in 2024. Part of this is better reporting — the FAA has pushed voluntary reporting to pilots, ground crews, and airline maintenance. But part is real: bird populations near major airports have grown as suburban sprawl creates more diverse habitat adjacent to runways. In 2020 strikes dropped sharply to 11,625 — a direct effect of COVID-19 traffic reduction — then rebounded strongly from 2021 onward.
import matplotlib.pyplot as plt
annual = (
df[df["incident_year"] >= 2005]
.groupby("incident_year")
.size()
.reset_index(name="strikes")
)
plt.figure(figsize=(12, 5))
plt.bar(annual["incident_year"], annual["strikes"], color="#e05c2e")
plt.title("FAA Wildlife Strikes per Year (2005–2025)")
plt.xlabel("Year")
plt.ylabel("Strike Events")
plt.tight_layout()
plt.savefig("strikes_per_year.png", dpi=150)Phase of Flight: Approach Is the Danger Zone
Aircraft are most vulnerable on approach (87,725 strikes) and landing roll (37,393), followed closely by take-off run (34,463) and climb (30,469). This makes sense: aircraft are flying slower and lower, birds are more concentrated near runways, and pilots have less reaction time. En-route strikes at altitude account for only 6,227 events — birds simply don't spend much time at cruising altitude.
phase_counts = (
df.groupby("phase_of_flight")
.size()
.sort_values(ascending=False)
.head(8)
)
phase_counts.plot(kind="barh", figsize=(10, 5), color="#2563eb")
plt.title("Wildlife Strikes by Phase of Flight")
plt.xlabel("Strike Count")
plt.tight_layout()
plt.savefig("strikes_by_phase.png", dpi=150)The Most Dangerous Species
Mourning doves lead identifiable species at 17,847 strikes, followed by barn swallows (11,691) and killdeer (11,403). American kestrels (10,446) and horned larks (9,850) round out the top five. However, species identity is missing or labeled "unknown" for roughly 35% of reports — limiting species-level risk modeling. For ML use cases, the species_risk_score column normalizes available identifications against damage rate and engine ingestion probability.
top_species = (
df[~df["species"].str.startswith("Unknown", na=True)]
.groupby("species")
.size()
.sort_values(ascending=False)
.head(12)
)
top_species.plot(kind="bar", figsize=(12, 5), color="#16a34a")
plt.title("Top 12 Identified Species in Wildlife Strike Reports")
plt.ylabel("Strike Count")
plt.xticks(rotation=30, ha="right")
plt.tight_layout()
plt.savefig("top_species.png", dpi=150)Airport Hotspots
Denver International leads reported strikes at 11,515, followed by Dallas/Fort Worth (8,863), Chicago O'Hare (7,276), and JFK (6,632). High-volume airports naturally log more strikes, but the dataset enables normalizing by operations count when combined with BTS traffic data. Memphis International's high rank (5,476) despite lower passenger volumes reflects its role as a major cargo hub with 24-hour operations — night flights increase wildlife strike risk because many birds are active at dusk and dawn.
top_airports = (
df[df["airport"] != "UNKNOWN"]
.groupby("airport")
.size()
.sort_values(ascending=False)
.head(10)
)
top_airports.plot(kind="barh", figsize=(10, 6), color="#7c3aed")
plt.title("Top 10 Airports by Wildlife Strike Reports")
plt.xlabel("Strike Count")
plt.tight_layout()
plt.savefig("top_airports.png", dpi=150)Engine Ingestions and the Real Cost
Of the 341K events, 3,652 involved engine ingestion — a bird entering a running engine. These are the most dangerous and expensive strike types. The dataset records repair costs for 5,375 events with a total declared cost of $790,357,974. That's a floor: the majority of strikes don't result in a formal cost estimate, and many costs go unreported. Engine ingestion events also logged 17,067 aircraft-out-of-service hours across the full dataset.
# Engine ingestion risk by species
ingestion_rate = (
df[df["species"].notna() & ~df["species"].str.startswith("Unknown")]
.groupby("species")
.agg(strikes=("record_id","count"), ingestions=("any_engine_ingestion","sum"))
.query("strikes >= 100")
.assign(ingestion_rate=lambda x: x["ingestions"] / x["strikes"])
.sort_values("ingestion_rate", ascending=False)
.head(10)
)
print(ingestion_rate[["strikes","ingestions","ingestion_rate"]].to_string())Damage Severity Model
Damage is reported for about 65% of records; the remainder are classified as "None" or left blank. Of those with a classification: Minor (8,762), M? — meaning "minor or unknown" (8,447), Substantial (4,388), and Destroyed (88). The ClarityStorm pipeline adds a damage_severity_tier column (Low / Medium / High / Critical) that aggregates the raw FAA codes into a four-class label suitable for classification tasks.
severity_dist = df["damage_severity_tier"].value_counts()
severity_dist.plot(kind="bar", figsize=(8, 4), color="#dc2626")
plt.title("Wildlife Strike Damage Severity Tier Distribution")
plt.ylabel("Event Count")
plt.xticks(rotation=0)
plt.tight_layout()
plt.savefig("damage_severity.png", dpi=150)AI/ML Use Cases
- Damage prediction: binary or multiclass classifier on species, aircraft type, airport, season, and phase of flight
- Engine ingestion risk scoring: probability model for high-consequence events — useful for wildlife hazard assessment tools
- Airport risk benchmarking: combine with BTS operations data to compute strike-per-10K-ops rates by facility
- Species behavior modeling: seasonal and geographic patterns in species risk scores support wildlife management planning
- Insurance underwriting: airline and general aviation risk models using phase of flight, aircraft mass class, and species_risk_tier
- Computer vision training data: strike images (has_image flag) can be linked to supplement classification dataset curation
The free sample contains 1,000 rows. The complete FAA Wildlife Strike Database covers 341K+ events from 1990–2025, available as CSV and Parquet with a commercial license for $79.