Turns out, the people who work at Amazon actually are human. We can definitively say that now after finally learning what brought the company's Amazon Web Services platform (AWS) to a grinding halt: a stupid typo. You might make hundreds of them a day, but it's likely never brought your entire business—or hundreds, or maybe even thousands of other businesses—to a complete standstill.
On Tuesday, Amazon reported a partial failure of its S3 hosting platform, which resulted in numerous media and business websites slowing significantly or altogether crashing. Some of the impacted sites included popular destinations like Slack, Trello, the U.S. Securities and Exchange Commission, Splitwise, Business Insider, and Medium. "[C]ustomers experienced increased faults for CloudWatch alarms APIs and delays in processing some alarms in the US-EAST-1 Region," Amazon said on its AWS Service Health Dashboard on Tuesday. "The issue has been resolved and the service is operating normally."
After taking the next few days to investigate the outage, Amazon released a lengthy report on Thursday explaining exactly what happened. According to Amazon, "an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended." Long story short: Some dude hit the wrong key on his keyboard, and that resulted in a four-hour outage that took a good chunk of the internet offline.