Case study · February 2017

A mistyped command took down a slice of the internet.

Q: What did AWS change afterward?

AWS described changes including safeguards to prevent removing capacity below a minimum required level and improved subsystem restart time.

A single incorrect input to a routine debugging command removed more servers than intended from an AWS S3 subsystem — and the restart took hours longer than expected, because the affected systems hadn't been fully restarted in years.

AWS S3 outage cost AWS us-east-1 outage 2017 S3 outage cause cloud provider outage cost example

Read the quick facts Model an infrastructure outage

Scroll for the timeline

~4h Duration in the US-EAST-1 region

$150M Estimated cost to S&P 500 companies

Quick facts

What happened, in one table.

Sources are linked inline, including AWS's own public post-incident summary.

Date February 28, 2017, starting around 9:37am PST.

What broke An authorized engineer, following an established playbook to debug the S3 billing system, executed a command meant to remove a small number of servers — but one input was entered incorrectly, removing a much larger set of servers than intended and taking two core S3 subsystems offline, per AWS's own summary.

Scale The outage lasted roughly four hours in the US-EAST-1 region and disrupted a large share of the internet, since a wide range of unrelated websites, apps, and even other AWS status tools depended on S3 for storage or configuration — AWS couldn't update its own service dashboard because the dashboard itself depended on the affected region.

Recovery bottleneck The affected subsystems had grown so large over years of operation that they had never been fully restarted at that scale, and the restart process took substantially longer than anticipated as a result — a capacity and operational-testing gap, not a repeat of the original mistake.

Reported cost The Wall Street Journal reported an estimate from cyber-risk modeling firm Cyence that the outage cost S&P 500 companies about $150 million in aggregate — a widely cited but third-party modeled figure, not a sum of individual company disclosures.

Why it cost so much

The blast radius was the entire internet's dependency graph.

Almost none of the cost here belonged to AWS's own customers of the billing subsystem — it belonged to everyone else who depended on S3 without realizing how deeply.

A routine command is still a production change

The operator was following an established playbook, not improvising — yet a single mistyped input had an outsized blast radius, which is why input validation and blast-radius limits matter even for "routine" operational commands.

Systems that never restart are systems you haven't tested

The recovery took longer than expected specifically because the affected subsystems hadn't been restarted at their current scale before — untested recovery paths are a hidden source of MTTR risk that capacity growth quietly creates.

Third-party dependency is invisible until it fails

Companies with no direct relationship to AWS's billing subsystem still went down, because their own infrastructure quietly depended on the same regional storage layer — a reminder to map, not assume, your actual blast radius from a single vendor's region.

FAQ

AWS S3 outage, answered.

Questions that come up when citing this incident in a cloud-dependency or vendor-risk case.

Was this an attack on AWS? No — AWS attributed it to an internal operational error during a routine debugging procedure, not any external attack.

Why is the $150 million figure a third-party estimate rather than AWS's own number? AWS does not publish a cost estimate for its own outages; the $150 million figure comes from Cyence's cyber-risk modeling as reported by the Wall Street Journal, making it directional rather than an audited total.

What did AWS change afterward? AWS's public summary described changes to its tooling, including safeguards to prevent removing capacity below a minimum required level and improvements to subsystem restart time.

How would this map to the calculator? Use the IT downtime calculator or website downtime calculator depending on whether you're modeling internal infrastructure or customer-facing impact from a vendor outage.

Your turn

What would a cloud provider outage cost you?

Model your own dependency footprint, revenue, and recovery time using the same formula.

Open the IT downtime calculator All case studies