Cloud Outages: Who's at Fault? - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

08:00 AM
Connect Directly

Cloud Outages: Who’s at Fault?

Cloud outages do happen. So, how can you and your IT group minimize the impact on the company?

Imagine a scenario where the unthinkable happens. Your company’s cloud provider suffers a major outage that grinds business to a halt. While the IT department and CIO are busy placing blame of the outage on the provider, the rest of the organization is likely to blame the internal IT staff for the disruption in service. So that begs the question, where does the responsibility for outages lie, and what can be done to mitigate outage risk?

An “unthinkable” scenario of a catastrophic cloud outage isn’t nearly as far-fetched as some might think. Want some recent proof? How about Amazon’s massive 5-hour AWS outage that occurred in February? This impacted things such as Quora, GigHub and Docker. Another recent example is when Rackspace faced a 3-hour worldwide cloud outage that impacted popular SaaS products including Cisco Spark.

Image: Pixabay/metsi
Image: Pixabay/metsi

Clearly, cloud outages remain a fairly common occurrence. IT leadership must recognize this and understand that they aren’t off the hook when it comes to outages that could impact a company’s bottom line. When you outsource infrastructure management to a third-party cloud provider, you’re trusting that the provider will adhere to the level of accessibility as outlined in their service level agreement (SLA). But, it’s important to note that a transference of trust in supporting underlying network components is not a blanket transference of responsibility when outages occur. For IT departments, the SLA is your first line of defense. If an SLA does not meet your requirements, it’s up to you to seek out providers that offer more robust solutions with higher penalties if the agreement level is breached.

Beyond the SLA, there are plenty of other ways that cloud customers can limit the impact of a major service provider outage. One way is to leverage a hybrid cloud approach where you load balance between on-premises and public cloud resources. That way, an outage in one segment of the infrastructure will not completely knock your applications offline. Multi-cloud strategies are also becoming popular. This is especially true now that administrators have a wide array of multi-cloud management platforms that significantly reduce the effort required when working inside differently-architected cloud environments.

For those of you that already have robust plans and network designs in place that sufficiently reduce the impact of a potential public cloud outage, I have one more question for you: What is your strategy surrounding uptime of shadow IT applications?

Even though an employee or department skirted standard operating procedures for application usage on the corporate network, it remains the duty of the IT department to track down these apps and do whatever is possible to manage accessibility risk. This is where a shadow IT outreach program could be used to identify and wrap protection around unauthorized applications that remain critical to the business.

The one caveat to cloud outage risk mitigation is that it’s not going to be free. Cloud service providers that offer more robust infrastructures and improved SLA’s are going to demand a premium price. So too is the time and money spent implementing hybrid, multi-cloud and other cloud resiliency protocols and procedures.

If the business makes the universal decision to not pay for this type of risk mitigation, that’s one thing. But IT departments and IT leadership must, at minimum, perform their due diligence and provide a cost/benefit analysis based on the probability that a cloud outage will economically impact business operations. In some cases, that economic impact will be so low that the added time and money spent to bolster cloud resiliency is not worth the investment. But for most, at least some form of added protection will be money well spent. It’s simply up to the IT department to determine exactly where that level of protection should be.

Andrew has well over a decade of enterprise networking under his belt through his consulting practice, which specializes in enterprise network architectures and datacenter build-outs and prior experience at organizations such as State Farm Insurance, United Airlines and the ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
IT Salary Report 2020: Get Paid What You Are Worth
Jessica Davis, Senior Editor, Enterprise Apps,  2/12/2020
10 Analytics and AI Startups You Should Know About
Cynthia Harvey, Freelance Journalist, InformationWeek,  2/19/2020
Fighting the Coronavirus with Analytics and GIS
Jessica Davis, Senior Editor, Enterprise Apps,  2/3/2020
White Papers
Register for InformationWeek Newsletters
State of the Cloud
State of the Cloud
Cloud has drastically changed how IT organizations consume and deploy services in the digital age. This research report will delve into public, private and hybrid cloud adoption trends, with a special focus on infrastructure as a service and its role in the enterprise. Find out the challenges organizations are experiencing, and the technologies and strategies they are using to manage and mitigate those challenges today.
Current Issue
IT Careers: Tech Drives Constant Change
Advances in information technology and management concepts mean that IT professionals must update their skill sets, even their career goals on an almost yearly basis. In this IT Trend Report, experts share advice on how IT pros can keep up with this every-changing job market. Read it today!
Flash Poll