The Amazon Outage in Perspective: Failure Is Inevitable, So Manage Risk | Opinions | ChannelWorld.in

PARTNER HOTLINES

The Amazon Outage in Perspective: Failure Is Inevitable, So Manage Risk

By Bernard Golden on Nov 08, 2012
Bernard Golden About the author

Bernard Golden

Bernard Golden is the vice president of Enterprise Solutions for enStratus Networks, a cloud management software company. He is the author of three books on virtualization and cloud computing, including Virtualization for Dummies. Follow Bernard Golden on Twitter @bernardgolden.

An endless stream of tweets and blog posts have noteddescribed andbewailed last week's Amazon Web Services outage. Some people characterized the outage as an indictment of public cloud computing in general. Others, some of whom work at other cloud providers, characterized it as indicative of AWS-specific shortcomings. Still others used the event as an opportunity to outline how users have to be sure to hammer home SLA penalty clauses during contract negotiations, just to ensure protection from outages. 
Most of these responses are reflective of bias or the commenter's own agenda and fail to draw the proper lessons from this outage. More crucially, they fail to offer really useful advice or recommendations, preferring to proffer outmoded or alternative solutions that do not provide appropriate risk mitigation strategies appropriate for the new world of IT.

Analysis: Amazon Outage Started Small, Snowballed Into 12-Hour Event

The first thing to look at is what risk really is. Wikipedia calls it "the probable frequency and probable magnitude of future loss." In other words, risk can be ascertained by how often a problem occurs and how much that problem is likely to cost. Naturally, one has to evaluate how valuable mitigation efforts to address a risk are, given the cost of mitigation. Spending $1 to protect oneself against a $1,000 loss would seem to make sense, while spending $1,000 to protect oneself against a $1 loss is foolish. 
Amazon Outages Show That Failure Is An Option

The question for users is whether this outage presents a large enough loss that continuing to use AWS is no longer justified (i.e., is too risky) and that other solutions should be pursued. Certainly there are now applications running on AWS that represent millions or even tens of millions of dollars of annual revenue, so this question is quite germane.

In terms of this specific outage, Amazon posted an explanation that describes it as a combination of some planned maintenance, a failure to update some internal configuration files and a programmatic memory leak. The result was poor availability of Amazon's Elastic Block Storage (EBS) service. 
Interestingly, the last large AWS outage was also an EBS failure, although even more interestingly, it had an entirely different cause, though human error was the trigger for the previous outage as well. In both cases, someone misconfigured an EBS resource, which triggered an unexpected condition, resulting in a service outage. 
Most interesting of all, AWS says users shouldn't be surprised by this occurrence. Amazon's No. 1 design principle: "everything fails all the time; design your application for failure and it will never fail."

Many people are outraged by this, feeling that a service provider should take responsibility for ensuring 100% (or at least "five nines") of service availability. Amazon's attitude, they imply, is irresponsible. The right solution, they say, is that users should look to a provider that is willing to take responsibility and provide a service that is truly reliable, made possible by use of so-called "enterprise-grade" hardware and software backstopped by ironclad change control.

There Is No "Right" Equipment, No Matter What Your SLA Says

There's only one problem: the solution proposed by commenters is outmoded, inappropriate and unsustainable.

First, it assumes that availability can be increased by use of enterprise-grade equipment. The fact is, every type of equipment fails, often at inconvenient times. Believing that availability can magically improve by simply using the "right" equipment is doomed to failure.

Resource failure is an unfortunate reality. The primary issue is what user organizations should do to protect themselves from hardware failure. It's what they should really do, too. I view the "negotiate harder on the SLA" strategy as akin to "the beatings will continue until morale improves," meaning that it makes the SLA-demander feel better but is unlikely to result in any actual improvement.

Commentary: Cloud Computing and the Truth About SLAs

Many of the cloud providers commenting on the AWS outage propose this kind of solution. In my view, this demonstrates how poorly they understand this issue. Their hardware will fail, too. Those engaged in taunting a competitor when it experiences a service failure should remember that pride goes before a fall
Second, ironclad change control processes are not actually going to reduce resource failure. This is because anything involving human interaction is subject to mistakes, which results in failure. It's instructive to note that both major AWS outages were not the result of hardware failure, but of human error-specifically, human error that interacted with system design assumptions that failed to account for the type of error that occurred. And even organizations that are strongly ITIL-oriented experience human-caused problems. 
Finally, the solutions proposed don't account for the world of the future. Every company is going to experience a massive increase in IT scale; believing that just putting in place rigid enough processes, with enough checks and balances, will reduce failure just doesn't recognize how inadequate that approach is for this new IT world. No IT organization (and no cloud provider) will be able to afford enough people (or enough enterprise-grade equipment) to pursue this type of solution. 
Redundancy, Failover Have Been Best Practices For a Long Time

The true solutions for resource failure has long been known: redundancy and failover. Instead of a single server, use two; if one goes down, it's possible to switch over to the second to keep an application running. It's just that, in the past, implementing redundancy was unaffordable except for a tiny percentage of truly mission-critical applications, given the cost of hardware and software. 
The genius of cloud computing is that it offers the ability to address this redundancy easily and cheaply. Many users have designed their apps to be resilient in the face of individual resource failure and have protected themselves against it-unlike those who pursue the traditional solutions proffered by many commenters which will, inevitably, result in an outage when the enterprise-grade equipment fails.

Perspective: Do Customers Share Blame in Amazon Outages?

The more troubling situation is the infrequent failures that have human error involved, which result in more widespread service failure. In other words, it's not just one application's resources being unavailable, but a service being out for a large number of applications.

It's tempting to believe the problem is that Amazon just doesn't have good process or smart enough people working for it and that, if those aspects were addressed by it (or another provider), then these infrequent failures wouldn't occur.

This attitude is wrong. These corner case outages will continue, unfortunately. We are building a new model of computing-highly automated and vastly scaled, with rich functionality-and the industry is still learning how to operate and manage this new mode of computing. Inevitably, mistakes will occur. The mistakes are typically not simple errors but, rather, unusual conditions triggering unexpected events within the infrastructure. While cloud providers will do everything they can to prevent such situations, they will undoubtedly occur in the future.

In the End, It Comes Down To Risk

What is the solution for these infrequent yet widespread service outages? AWS recommends more extensive redundancy measures that span geographic regions. Given AWS scoping, that would protect against region-wide resource unavailability. There's only one problem. Implementing more expansive redundancy is complex and expensive-far more so than the simpler measures associated with resource redundancy.

Tips: Mitigating the Risk of Cloud Services Failure: How to Avoid Getting Amazon-ed

This brings us back to the topic of risk. Remember, it's frequency probability measured against magnitude of loss associated with a failure. You have to evaluate how frequently you expect these less-frequent, larger-scale resource failures to occur and compare that to the cost of preventing them via design and operations. In some sense, one is evaluating the cost of careful design and operation vs. the cost of a more general failure.

Certainly the cost of the design and operation can be worked out, while many people prefer to avoid thinking of the cost of a more widespread failure that would take their application offline. However, as more large revenue applications move to AWS, failing to evaluate risk and implement appropriate failure-resistant measures will be imprudent.

Overall, it's not as though the possibility of these outages is unknown, or that the appropriate mitigation techniques are easily discoverable as well. You should expect that CSPs will suffer general resource outages and not blame the provider in the event of such an outage. Instead, you should recognize that you made a decision without perhaps acknowledging the risk associated with it. Those who look at these outages and choose to do nothing more than damn the provider and demand perfection don't recognize how dangerous a game they are playing.

Bernard Golden is the vice president of Enterprise Solutions for enStratus Networks, a cloud management software company. He is the author of three books on virtualization and cloud computing, including Virtualization for Dummies. Follow Bernard Golden on Twitter @bernardgolden.

Follow everything from CIO.com on Twitter @CIOonline, on Facebook, and on Google +.

Latest Opinions

TECHNOLOGY DIRECTIONS 2015

Enhancing Digital User-Experience in 2015: Karthik Ananth,Zinnov

How digital transformation is impacting the way companies engage with their customers.

Collaborating To Outcome Based World: Priyadarshi Mohapatra, Avaya

Priyadarshi Mohapatra, Managing Director, India and SAARC, Avaya, on how IT is transitioning from a keep-the-lights-on role to one that enables customers to deliver results.

Journey to the Third Platform in 2015: Rajesh Janey,EMC

Rajesh Janey, President, EMC, India and SAARC, says that flash storage will accelerate the growth of the third platform.

Fostering New Relationships in 2015: Partha Iyengar, Gartner

In order to adopt a digital business strategy, channel partners need to establish relationships with LoBs.

Combating a New Breed of Cyber Attacks in 2015: FireEye

Ramsunder Papineni, Regional Director, India and SAARC, FireEye, on the paradigm shift in today’s threat landscape and how organizations can combat new threats.

The Dawn of the Digital Age: Akhilesh Tuteja, KPMG

The development of digital infrastructure will be a key growth driver for technology and solution providers. 

Paradigm Shift from End-Users to User-First : Parag Arora,Citrix

Parag Arora, Area Vice President and India Head, India Sub-continent, Citrix, says new technologies will force organizations to take a user-first approach in 2015.

HP's Blueprint for 2015 - SDN and Cloud Computing : Neelam Dhawan

Neelam Dhawan, VP and General Manager, Enterprise Group and Country MD, HP India,  on why a combination of cloud computing and SDN will dominate 2015.

SAP Banks on HANA for 2015 : Ravi Chauhan

Ravi Chauhan, Managing Director, India and Sub- continent, SAP, on becoming a cloud company powered by HANA.

Mobile and Cloud Are Gamechangers of the Future: Karan Bajwa,Microsoft

Karan Bajwa, Managing Director, Microsoft India, says, in  2015, organizations will adopt a mobile-first and cloud-first strategy to get ahead of competition.  
 

Ready to Fight 2015's Threats : Jagdish Mahapatra,McAfee

Jagdish Mahapatra, Managing Director, McAfee, India and SAARC, part of Intel Security, says the company is armed with new solutions to beat sophisticated threats.

A Network for the Internet of Everything : Dinesh Malkani,Cisco

Dinesh Malkani, President, India and SAARC, Cisco, talks about IoT and the significant technology transitions in the networking world.
 

Moving to the Third Platform: Jaideep Mehta, IDC

Cloud and mobility are the two technologies that will fuel the rapid adoption of the third platform in India.

Envisaging a Holistic Security Strategy For 2015: Sanjay Rohatgi,Symantec

Sanjay Rohatgi, President–Sales, Symantec India, says the company has a set of holistic solutions in place to secure organizations from security threats. 

Intel's 2015 Plan: Taking the Digital India Story Forward

Debjani Ghosh, VP-Sales and Marketing Group and MD, South Asia, Intel, is banking on innovative technology to make the Digital India dream a reality.

Embracing SDN in 2015: Ashish Dhawan,Juniper Networks

Ashish Dhawan, Managing Director, India and SAARC, Juniper Networks, talks about the company’s well-etched roadmap to ride the SDN wave.

Beating the Bad Guys: Sivarama Krishnan, PwC

Organizations will need to turn inwards to establish robust information security strategies.

Hybrid Cloud is 2015's Biggest Gainer: Sunil Gupta,Netmagic

Sunil Gupta, Executive Director and President, Netmagic, an NTT Communications company, expects the hybrid cloud to be the biggest gainer in 2015.

2015 is the Year of SDDC: Arun Parameswaran,VMware

Arun Parameswaran, Managing Director, VMware India, says 2015 will be the year of software-defined datacenter.

Enterprises Surging Ahead with Hybrid Cloud in 2015 : Anil Valluri,NetApp

Anil Valluri, President, NetApp, India and SAARC, says, in 2015, enterprise platforms will start encompassing hybrid cloud architectures.
 

Armed for 2015's Security Threats: Anil Bhasin,Palo Alto Networks

Anil Bhasin, Managing Director, Palo Alto Networks India, says new threats weaken an organization’s network but advanced security tools can change that.

Building Capabilities for a Digital Tomorrow: Alok Ohrie,Dell

Alok Ohrie, President and Managing Director, Dell India, on the company’s investments to build end-to-end solutions and delivery capabilities for a digital world.

Going Truly Mobile in 2015: Vikram Sehgal, Forrester Research

India is embracing mobile faster than mature economies. Here’s what it needs to watch out for to do it well.

VIDEOS | FORECAST 2015

SAP Cloud Strategy Powered by HANA: Ravi Chauhan

CIOs will rapidly adopt SMAC in 2015, and SAP has innovative solutions to provide business advantage and competitive edge to India Inc., says Ravi Chauhan, MD, SAP India.

Dell to Dominate Converged Infrastructure Market in India: Alok Ohrie

From a PC manufacturer to an end-to-end solutions provider, we are gaining India market share through a strong partner ecosystem: Alok Ohrie, MD & President, Dell India.

Citrix Will Catalyze Shift From End-user to User-First: Parag Arora

Parag Arora, Area VP and India head, India Sub-continent, Citrix, talks about the company’s vision to drive a paradigm shift in enterprise IT: From an end-user approach to a user-first approach powered by mobile and cloud computing and enabled by competent channel partners.

Mobility to Boost Collaboration and Conferencing in 2015: Priyadarshi Mohapatra, Avaya

Avaya delivers great value as an end-to-end communications solutions provider across data, audio and video, says Priyadarshi Mohapatra, MD India and SAARC, Avaya

Securing Organizations Against Modern Day Threats: Sanjay Rohatgi, Symantec

We have proven that we can manage and secure an organization’s data from within and outside a network, which is a vital requirement by CISOs today, says Sanjay Rohatgi, President-Sales, India, Symantec.

We are the Apple of Network Security World: Anil Bhasin, Palo Alto Networks

Anil Bhasin, Managing Director, Palo Alto Networks India, says new threats weaken an organization’s network but advanced security tools can change that.

Software Defined Networking to Rule in 2015: Ashish Dhawan, Juniper Networks

Networking is definitely moving towards a software-defined paradigm and we continue to dominate the India market with an extensive portfolio, and well-entrenched channels, says Ashish Dhawan, MD, India and SAARC, Juniper Networks.

EMC to Dominate 3rd Platform Across India Inc.: Rajesh Janey

In the last year, EMC has refreshed its entire product line to enable customers take advantage of the 3rd platform, says Rajesh Janey, president, India and SAARC, EMC.

IoT Vital for Digital India Initiative: Debjani Ghosh, Intel

The tons of data that will be generated in the coming years will open opportunities in storage, and analytics, says Debjani Ghosh, VP, sales and marketing group, and MD-South Asia, Intel.

VDI, Flash and Hybrid Cloud to Propel Storage Market: Anil Valluri, NetApp

We are witnessing a movement from traditional data storage systems to a hybrid cloud environment says Anil Valluri, president, India and SAARC, NetApp.

SDDC is the Big Shift for 2015: Arun Parameswaran, VMware

Arun Parameswaran, MD, VMware, says that in India, unlike other countries, there is still a huge untapped opportunity to virtualize existing infrastructure in 2015.

Go Cloud for Business Advantage: Sunny Sharma, Foetron

Sunny Sharma, CEO and Founder, Foetron, speaks about the company's focused roadmap to ride the public cloud wave.

FireEye to Combat APTs Across Multiple Vectors in 2015: Ramsunder Papineni

Going into 2015, organizations need to think of security more holistically, including ways to defend end points, e-mail, Web, file, and mobile security, says Ramsunder Papineni, regional director, India and SAARC, FireEye.

The 2015 Challenge: Retaining IT Talent: Shirish Anjaria, Dynacons

Shirish Anjaria, CMD, Dynacons Systems & Solutions, speaks about how partner companies can enhance the talent pool of skilled IT staff.

New Style of IT to Gather Traction in 2015: Neelam Dhawan, HP

SDS, SDN and software defined infrastructure will play a key role across Indian organizations in 2015, says Neelam Dhawan, VP and GM, enterprise group, country MD India, HP.

Building Strong Vendor-Partner Relationships: Pawan Khurana, QuantM

Pawan Khurana, CEO, QuantM, on what he expects from technology vendor companies in 2015.

IoE to be Biggest Market Disruptor in 2015: Dinesh Malkani, Cisco

We continue developing innovative solutions in IoT and cloud computing and help our partner ecosystem capitalize on market opportunities, says Dinesh Malkani, president, Cisco India and SAARC.

New Technologies For New Growth: Murtuza Sutarwala, Swan Solutions & Services

Deep selling and upselling emerging technologies to customers enhances our value proposition as a competent solution provider, says Murtuza Sutarwala, Swan Solutions & Services.

Analytics is a Goldmine for Channels in 2015: Anoop Pai Dhungat, Galaxy Office Automation

Analytics, mobility, and security are the technology megatrends for us in 2015, says Anoop Pai Dhungat, CMD, Galaxy Office Automation.

Smartphone Proliferation to Impact Mobile Strategies in 2015: Vikram Sehgal, Forrester

Enhancing customer experience through mobility will be key priority for organizations in 2015, says Vikram Sehgal, VP and Research Director, Forrester.

3rd Platform to Take Off in India: Jaideep Mehta, IDC

Jaideep Mehta, MD, India and South Asia, IDC, say cloud computing and mobility will be the fastest growing 3rd platform technologies in India.

Opex Model the Way Forward for Partners in 2015: Ajay Sawant, Orient Technologies

Ajay Sawant, Orient Technologies, talks about the massive shift as traditional system integrators move towards an Opex-led business model.

Digital India is Colossal Opportunity for Channels: Akhilesh Tuteja, KPMG

Channel partners should devise a vertical strategy with the right alliances and innovative solutions, says Akhilesh Tuteja, Partner-IT Advisory, KPMG India.

Going Digital the Way Ahead for India Inc: Karthik Ananth, Zinnov

Since India is a mobile -first market, Indian organizations that are turning digital should ensure that they deliver a uniform experience for their customers, says Karthik Ananth, Director, Zinnov.

EDITOR'S PICK

Forecast 2015: IT Spending On An Upswing

As purse strings loosen up, CIOs blend innovation into 2015 IT budgets, but security and cost containment remain top priorities.

‘Security Compliance is Not a Proactive Phenomenon in India’

Pavan Duggal, Cyber Law Expert at the Supreme Court of India, explains why channel partners need to look beyond the IT Act 2000 as the security standards, given today’s fast-changing threat landscape, rapidly evolve.

IT is Indispensable for Business Optimization: David Aires, Intel

David L. Aires, VP, Information Technology Group, and GM, Information Technology Operations, believes security to be the biggest challenge in the current IT environment.

Is the CIO Role Nearing Extinction?

New technologies are shifting power to the hands of the user, endangering the CIO role. But do Indian CIOs consider that a threat or an opportunity? 

The Authentication Market is Big Play for Channels: Gaurav Chawla, Gemalto

We are building a partner network to address the increased demand for authentication solutions across India, says Gaurav Chawla, Director, IAM, Gemalto India.

Versatile Infosecurity: Riding the Security Wave

It takes vision and persistence to stay on top of the security curve. Versatile Infosecurity has mastered that art.

How Futurenet Technologies Helped Sterlite Copper Adopt Next-gen Client Computing

Sterlite Copper was able to successfully adopt next-gen client computing facilities with hand-in-hand assistance from Chennai-based Futurenet Technologies.

DigitalTrack Solutions: Right on the Security Track

DigitalTrack is keeping pace with the changes in the IT security space through DDoS and WAF solutions and is pushing security audits as part of its next move.

SLIDESHOWS

6 Leaders Who Headed for an Abrupt Exit

The abrupt exit of top leaders of Indian and global tech companies this year, with many of them citing ambiguous reasons, surprised the technology world.

Gartner Executive Summary Survey 2014

Gartner's Annual CIO Survey highlights the trends that will drive organizational IT spend in 2014.

10 Overhyped Tech Products That Crashed and Burned

The demos blew everyone away. Then reality hit.

Gartner Executive Summary Survey 2014

Gartner's Annual CIO Survey highlights the trends that will drive organizational IT spend in 2014.

ChannelWorld Survey: State of the Market 2014

Partners poll their sentiments, expectations, pain points, and challenges for the coming year.

FAST TRACK

TIM Infratech

Delivering ‘best of breed’ technologies to enterprises is key to success, says Monish Chhabria, MD, TIM Infratech

Mudra Electronics

A vendor-agnostic strategy helped us sustain business, says Bharat Shetty, CMD, Mudra Electronics.

Systematix Technologies

Our USP is a customer-friendly approach backed by services, says Akhilesh Khandelwal, Director, Systematix Technologies.

CorporateServe Solutions

Our ability to turnaround complex ERP projects in record time is what gets us customer referral, says Vinay Vohra, Founder & CEO, CorporateServe Solutions.

KernelSphere Technologies

We are emerging as an end-to-end systems integrator, says Vinod Kumar, MD, KernelSphere Technologies.

Uniware Systems

We constantly validate emerging technologies for first-mover advantage, says Vergis K.R., CEO, Uniware Systems.

Astek Networking & Solutions

An innovative approach helps us stay successful, says Ashish Agarwal, CEO, Astek Networking & Solutions.

CSM Technologies

Our approach is backed by innovation and simplicity, says Priyadarshi Nanu Pany, CEO, CSM Technologies.

EMC PARTNER SHOWCASE

Partnering for Profitability

Atul H. Gosar, Director, Network Techlab, shares how the company’s association with EMC has provided it with a competitive edge and a wide customer base, leading to increased profitability.

Sponsored Content

Promising Pipeline

Venkat Murthy, Prime Mover, 22by7 Solutions, shares how EMC brings in competitive edge by enabling technology, GTM and lead generation, helping 22by7 acquire new customers and retain old ones.

Sponsored Content

Powerful Performance

Deepak Jadhav, Director, VDA Infosolutions, says initiatives by EMC around training and certification have helped the company’s staff improve its performance and enhance customer experience.

Sponsored Content

Performance Booster

Rajiv Kumar, CEO, Proactive Data Systems, says that the solution provider’s association with EMC has helped expand its customer base and added value to existing offerings.

Sponsored Content

Pursuit of Profitability

Santosh Agrawal, CEO, Esconet Technologies, shares insights on how the systems integrator’s association with EMC has spelled sustained success over the years.

Sponsored Content

Non-Performance is Not an Option

Nitin Aggarwal, Director, Trifin Technologies, shares insights on how the association with EMC has helped the system integrator stand out and empowered its personnel to deliver consistent performance.

Sponsored Content

STRATEGIC DIRECTIONS 2014

Driving IT to Make an Impact: IDC

IT is being increasingly viewed as something which would help drive revenue rather than just another cost line-item.

Software-Defined Infrastructure: Forrester

Firms must invest in transforming infrastructure to eradicate complex infrastructure to keep pace with business needs.

Better Safe Than Sorry: PwC

Organizations should create a culture of security that starts with commitment of top executives and cascades to all employees and third parties.

New Skills for a New Era: Gartner

A new talent strategy is required—one that is a key part of the evolving IT strategy and one that focuses on a blend of business and modern IT skills.

The Rise and Growth of Big Data: Ernst & Young

Leading organizations are reaping rich rewards on their investment in big data even as competition struggles to keep pace.

SOCIAL MEDIA @ CW India
SIGNUP FOR OUR NEWSLETTER

Signup for our newsletter and get regular updates.