The availability of everRun as a feature of the operating system could allow Windows Server 2008 to host multiple virtual machines running mission-critical systems.

Charles Babcock, Editor at Large, Cloud

January 9, 2009

4 Min Read

Fault tolerance will become an add-on feature of the Microsoft Windows Server 2008 operating system that can be switched on with little administrative training, according to spokesmen for Microsoft and Marathon Technologies, supplier of the everRun fault-tolerant system. It will become available in the second quarter.

For the highest level of availability, where users experience no interruption even though a component of the underlying system has failed, the price tag will be about $5,000. A lower level of availability, where the system recovers after a noticeable 20-second delay, will cost $2,000.

The move represents a lower level of skill needed for implementing fault tolerance, which was pioneered by Tandem Computers with its NonStop systems, still available through Hewlett-Packard. For many years, system fault tolerance that could survive the failure of any hardware or software component has been a highly engineered, high-expertise area of data center operation. It's often reserved for real-time trading systems and online banking transaction systems dealing with billions daily.

"Before, it was very complex to deploy and manage. It was mainly the top tier of computing that benefitted from fault tolerance," said Gary Phillips, president and CEO of Marathon, in an interview. Now a Windows Server administrator can activate the fault-tolerant feature without special training, after designating the level of availability sought.

In addition, Marathon's partnership with Microsoft will enable Microsoft to bring fault-tolerant operation to its Hyper-V virtual machines as well. No date has been specified other than in a "future version of Hyper-V," according to a Microsoft spokesman.

Windows Server 2008 users of either physical or virtual machines frequently want high-availability features or full system fault tolerance for their business workloads, especially Web-facing workloads, said Mike Schultz, Microsoft's director of product management for the Windows Server division, in a statement on the announcement.

The availability of everRun as a feature of the operating system could allow Windows Server 2008 to host multiple virtual machines running mission critical systems, while still keeping the risks inbounds.

Virtual machine users have been reluctant to stack up mission-critical systems, despite the pressure for server consolidation, because it puts "all the eggs in one basket," noted Phillips. If companies decide to do so, "this is where fault tolerance is no longer a luxury but a requirement," he said. Microsoft will at least gain a competitive talking point to use with customers considering VMware as an alternative. "This work represents the value of Microsoft's partner ecosystem..." Shultz said.

VMware is working on high-availability features for ESX Server as well.

In addition to full fault tolerance, everRun allows data center administrators to select lower levels of availability. Level 1 invokes clustering failover where a single server component may fail but the work is moved to another physical server.

In Marathon's lexicon, Level 2 availability is where a system recovers all data and application processing in about 20 seconds, by handling network and storage component failures. Network and storage failures account for about 80% of the system failures, Phillips said. Level 2-style protection can be purchased for Citrix XenServer and Windows Server 2003 operations currently and in the second quarter for Windows 2008 and XenServer. At some point in the future, it will be available for Windows Server 2008 host running any number of Hyper-V virtual machines.

Level 3 system fault tolerance recovers within milliseconds after any software or hardware component fails, a disruption unlikely to be noticed by the end user. EverRun can guarantee fast recovery by maintaining a live virtual machine on an identical physical server. If one system fails, the other immediately fills in. Level 3 fault tolerance will be available for Windows Server 2008 and Citrix XenServer hypervisor in the second quarter, with a Hyper-V version available at an unspecified point in the future.

It's relatively easy to recover a failed virtual machine by itself, and several vendors offer ways to do it. One way is simply to go back to where the virtual machine's image was stored and activate another instance. But such a move does nothing to recover application processing at the point of failure or any lost data. It's more complicated to recover both the virtual machine's application processing and its data at the instant of failure.

About the Author(s)

Charles Babcock

Editor at Large, Cloud

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive Week. He is a graduate of Syracuse University where he obtained a bachelor's degree in journalism. He joined the publication in 2003.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights