5 Years Later - Lessons From The Blackout?

IT Leadership

Howard Marks, Network Computing Blogger

August 15, 2008

2 Min Read

Yesterday was the 5th anniversary of the biggest electrical blackout in North American history. Some 50,000,000 people from Ohio to D.C. to Ontario (Canada, not California) were without power for up to four days. The mainstream media is covering the big picture and lessons the power industry can learn to make the grid more resistant to trees knocking down power lines. I wanted to take the opportunity to address the questions this event raises for IT.While I'm not generally a fan of planning for specific disasters, because Murphy's Law says if you prepare for wildfires you'll get hit with mudslides or earthquakes instead, the example of the 2003 blackout does raise some questions your disaster recovery plan will have to address.

First is distance. Various factors, like the effect of distance on network latency, and with synchronous replication and therefore application performance, and the convenience of having someone drive out to the DR site to install a new server or memory upgrade, lead most organizations to keep their DR site within a reasonable driving distance of their primary site. Here in New York that usually means across a river in New Jersey, upstate (yes, for a real New Yorker, White Plains is upstate), or Connecticut. All of which were blacked out.

So the question comes up, is greater distance -- to, say, Nevada -- worth the cost and trouble?

The second is when to declare a disaster and activate the DR site. The blackout struck New York at 4:11 p.m. on a Thursday. Most New York City locations had power restored by morning. Organizations that didn't activate their DR sites, and being in a NY office tower that didn't have a generator, had to restart their servers. The IT guys either had a really long night or the users had a slow morning.

Those that did activate not only had to limp through Friday running from the DR site, but probably had to spend the weekend failing back from the DR site to their primary systems. The fail-back process is rarely as well-thought-out or tested as the fail over, so I'm sure it was a long weekend for some folks.

Would you have declared a disaster at 4:11? At midnight? Or would you just consider Friday to be like a snow day and postpone the decision till Sunday, knowing blackouts rarely last more than a day?

More important, do you have a process as part of the DR plan?

About the Author(s)

Howard Marks

Network Computing Blogger

Howard Marks is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage systems, networks, management systems and Internet strategies at organizations including American Express, J.P. Morgan, Borden Foods, U.S. Tobacco, BBDO Worldwide, Foxwoods Resort Casino and the State University of New York at Purchase. The testing at DeepStorage Labs is informed by that real world experience.

He has been a frequent contributor to Network Computing and InformationWeek since 1999 and a speaker at industry conferences including Comnet, PC Expo, Interop and Microsoft's TechEd since 1990. He is the author of Networking Windows and co-author of Windows NT Unleashed (Sams).

He is co-host, with Ray Lucchesi of the monthly Greybeards on Storage podcast where the voices of experience discuss the latest issues in the storage world with industry leaders. You can find the podcast at: http://www.deepstorage.net/NEW/GBoS

See more from Howard Marks

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

About the Author(s)

Editor's Choice