Latent bugs wait for the right trigger
The defect had been live in production for nearly four weeks before any customer's configuration happened to activate it — a reminder that "no incidents yet" is not the same as "no risk present."
A single valid configuration change, pushed by one Fastly customer, triggered a five-week-old latent bug and knocked 85% of Fastly's network offline — taking Reddit, Amazon, Twitch, government sites, and major news outlets down with it, worldwide, at once.
Fastly never disclosed a financial cost for this incident, which is itself worth noting — this case study is included for its speed-of-response lesson, not its dollar figure.
This incident shows both the risk of shared infrastructure and what a well-rehearsed incident response looks like.
The defect had been live in production for nearly four weeks before any customer's configuration happened to activate it — a reminder that "no incidents yet" is not the same as "no risk present."
A single tenant's valid, permitted configuration change was enough to degrade the shared platform for every other customer — multi-tenant infrastructure needs blast-radius controls that don't rely on any one customer behaving conservatively.
A one-minute detection time and a 49-minute recovery kept this incident short enough that no company involved appears to have disclosed a specific loss — proof that MTTR investment pays off exactly when it's needed most.
Questions that come up when citing this incident in a CDN-dependency or incident-response case.
Mode
Accent