01
A routine command is still a production change
The operator was following an established playbook, not improvising — yet a single mistyped input had an outsized blast radius, which is why input validation and blast-radius limits matter even for "routine" operational commands.
02
Systems that never restart are systems you haven't tested
The recovery took longer than expected specifically because the affected subsystems hadn't been restarted at their current scale before — untested recovery paths are a hidden source of MTTR risk that capacity growth quietly creates.
03
Third-party dependency is invisible until it fails
Companies with no direct relationship to AWS's billing subsystem still went down, because their own infrastructure quietly depended on the same regional storage layer — a reminder to map, not assume, your actual blast radius from a single vendor's region.