Cloud Outages: Who's at Fault? - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

08:00 AM
Connect Directly

Cloud Outages: Who’s at Fault?

Cloud outages do happen. So, how can you and your IT group minimize the impact on the company?

Imagine a scenario where the unthinkable happens. Your company’s cloud provider suffers a major outage that grinds business to a halt. While the IT department and CIO are busy placing blame of the outage on the provider, the rest of the organization is likely to blame the internal IT staff for the disruption in service. So that begs the question, where does the responsibility for outages lie, and what can be done to mitigate outage risk?

An “unthinkable” scenario of a catastrophic cloud outage isn’t nearly as far-fetched as some might think. Want some recent proof? How about Amazon’s massive 5-hour AWS outage that occurred in February? This impacted things such as Quora, GigHub and Docker. Another recent example is when Rackspace faced a 3-hour worldwide cloud outage that impacted popular SaaS products including Cisco Spark.

Image: Pixabay/metsi
Image: Pixabay/metsi

Clearly, cloud outages remain a fairly common occurrence. IT leadership must recognize this and understand that they aren’t off the hook when it comes to outages that could impact a company’s bottom line. When you outsource infrastructure management to a third-party cloud provider, you’re trusting that the provider will adhere to the level of accessibility as outlined in their service level agreement (SLA). But, it’s important to note that a transference of trust in supporting underlying network components is not a blanket transference of responsibility when outages occur. For IT departments, the SLA is your first line of defense. If an SLA does not meet your requirements, it’s up to you to seek out providers that offer more robust solutions with higher penalties if the agreement level is breached.

Beyond the SLA, there are plenty of other ways that cloud customers can limit the impact of a major service provider outage. One way is to leverage a hybrid cloud approach where you load balance between on-premises and public cloud resources. That way, an outage in one segment of the infrastructure will not completely knock your applications offline. Multi-cloud strategies are also becoming popular. This is especially true now that administrators have a wide array of multi-cloud management platforms that significantly reduce the effort required when working inside differently-architected cloud environments.

For those of you that already have robust plans and network designs in place that sufficiently reduce the impact of a potential public cloud outage, I have one more question for you: What is your strategy surrounding uptime of shadow IT applications?

Even though an employee or department skirted standard operating procedures for application usage on the corporate network, it remains the duty of the IT department to track down these apps and do whatever is possible to manage accessibility risk. This is where a shadow IT outreach program could be used to identify and wrap protection around unauthorized applications that remain critical to the business.

The one caveat to cloud outage risk mitigation is that it’s not going to be free. Cloud service providers that offer more robust infrastructures and improved SLA’s are going to demand a premium price. So too is the time and money spent implementing hybrid, multi-cloud and other cloud resiliency protocols and procedures.

If the business makes the universal decision to not pay for this type of risk mitigation, that’s one thing. But IT departments and IT leadership must, at minimum, perform their due diligence and provide a cost/benefit analysis based on the probability that a cloud outage will economically impact business operations. In some cases, that economic impact will be so low that the added time and money spent to bolster cloud resiliency is not worth the investment. But for most, at least some form of added protection will be money well spent. It’s simply up to the IT department to determine exactly where that level of protection should be.

Andrew has well over a decade of enterprise networking under his belt through his consulting practice, which specializes in enterprise network architectures and datacenter build-outs and prior experience at organizations such as State Farm Insurance, United Airlines and the ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
The State of Chatbots: Pandemic Edition
Jessica Davis, Senior Editor, Enterprise Apps,  9/10/2020
Deloitte on Cloud, the Edge, and Enterprise Expectations
Joao-Pierre S. Ruth, Senior Writer,  9/14/2020
Data Science: How the Pandemic Has Affected 10 Popular Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  9/9/2020
White Papers
Register for InformationWeek Newsletters
2020 State of DevOps Report
2020 State of DevOps Report
Download this report today to learn more about the key tools and technologies being utilized, and how organizations deal with the cultural and process changes that DevOps brings. The report also examines the barriers organizations face, as well as the rewards from DevOps including faster application delivery, higher quality products, and quicker recovery from errors in production.
Current Issue
IT Automation Transforms Network Management
In this special report we will examine the layers of automation and orchestration in IT operations, and how they can provide high availability and greater scale for modern applications and business demands.
Flash Poll