By: Rachel Frnka
December 7 started as a typical, but busy, pre-holiday weekday. This included a mix of booming online retail sales ($33.9 billion spent during cyber week), packages flooding delivery services, and high online traffic. But much of that quickly came to a crawl. An outage of the AWS us-east-1 cloud region changed the good fortune for many websites and applications and impacted the lives of consumers across the United States and parts of Europe.
Starting around 10:45 a.m. ET, according to DownDetector, over 11,000 websites were down- from AWS services, Amazon’s network of brands, and seemingly separate websites such as Disney+ and Instacart.
Suddenly Amazon’s fleet of delivery drivers were unable to deliver packages when their custom-built app detailing which packages go where was unable to connect to the AWS server. Alexa-powered speakers, lightbulbs, and smart devices were functionless. Ring devices were no longer streaming live video. At the same time, Netflix saw a dramatic decline in traffic, day traders on Robinhood could no longer trade, and singles turning to Tinder were disappointed as they couldn’t swipe throughout the day. It’s hard to point to an industry that wasn’t affected by the outage because, unknowingly to many, AWS powers such a large portion of the internet.
Outages aren’t abnormal. In fact, thousands of outages happen every day. Large corporations aren’t immune to performance problems with their infrastructure. Remember Facebook/Meta’s 5+ hour outage in October that rocked the world of social media and communication?
What was concerning the morning of December 7 was the scale of the outages and how many people and businesses were impacted. As the source was traced to the AWS east region, it became obvious how much of the internet runs on the services provided by Amazon. But this isn’t the first AWS outage to impact much of the internet. Let’s also not forget last year us-east-1 went down for most of the day Wednesday, November 25. Sounds familiar?
As we think about 2022 and how we protect our websites and web applications from vulnerabilities out of our control—like another potential AWS outage—we need to think about proactive monitoring and what our reliance on cloud infrastructure looks like.
AWS is a huge, usually extremely reliable, service. It’s larger than both of its main competitors—Azure and Google Cloud—combined. But an incident such as December 7’s highlights the importance of embracing a multi-cloud strategy and proactive monitoring to inform you when you might need to switch to your backup cloud. While Amazon applications went down Wednesday morning, many of the websites reliant on AWS through API calls or AWS services began to slow down. Fully optimized pages were then loading at slower than recommended speeds and transactions like logins and checkouts were timing out.
When pages begin to slow seemingly out of nowhere, it’s important to treat these anomalies as an incident management situation. The image above is real data from SolarWinds® Scopify®page speed testing for a travel booking website monitored from our Eastern US probe and spiked from ~3 second load time to over 8 seconds. At this moment after recognizing a page loading problem, the web development team could have dug into where the slow-loading elements were—such as an API call to AWS services—and switched to their backup cloud until the incident was resolved.
For many businesses, this would have kept their sites up and their customers happy. A travel site would have still been able to book flights or hotel stays. Trading applications could make sure their users don’t lose money on their planned trades. When online businesses rely on their web apps’ availability and speed, it’s critical to know when to switch to your backup cloud.
It’s not realistic to think a major outage like this won’t occur in the future: it will. What you don’t know is how future outages might affect your site. Will you know in the moment, before it’s too late, that you’re affected?
Get ahead of slowdowns and outages and beat the competition by keeping your website up, fast, and ready for business. Get proactive monitoring free for 30 days for your availability, page speed, real user, and interaction monitoring.
Sources used in the creation of this analysis: