Here's What's Different About 'The Cloud' - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

06:46 PM
Charles Babcock
Charles Babcock
Connect Directly

Here's What's Different About 'The Cloud'

What's different about cloud computing versus the forms of computing that have gone before? It's really just a matter of scale, isn't it? The Google or or eBay data centers are maybe a little bigger than a big enterprise data center, right? Wrong. One answer lies in an example like Hadoop.

What's different about cloud computing versus the forms of computing that have gone before? It's really just a matter of scale, isn't it? The Google or or eBay data centers are maybe a little bigger than a big enterprise data center, right? Wrong. One answer lies in an example like Hadoop.Hadoop is an Apache incubator project that is likely to soon move to full fledged, open source project status. It has all the qualities of good cloud software, as opposed to large scale enterprise software. It's designed to work on a large server cluster of x86 instruction set machines. Such clusters exist in enterprises everywhere but few scale to the 25,000 at Yahoo that are running instances of Hadoop.

Google probably has even larger clusters running one of the building blocks of Hadoop, called MapReduce, which originated inside of Google. MapReduce knows where the data you're about to analyze is coming from--which disk drives--and it connects that understanding to a map of the processors available, assigning data processing to the processor closest to its point of origin. This allows a lot of data to flow off of hundreds or thousands of disk drives at a time, hit an analysis point very quickly and produce results, which are aggregated into some grand result, such as the answer to a search query.

The data would have been put on the disks in the same manner, striped across many drives so that a single data set could be found by extracting 64 megabyte chunks off each of 1,000 disks. In effect, analysis of 64 gigabytes takes only slightly longer than analysis of 64 megabytes. That's cloud computing. It builds out a large cluster of highly similar servers, manages them as a unit, operates them as a parallel machine, exploiting distributed memory, distributed processing and distributed storage, and achieves big results -- using low cost parts.

The other part of Hadoop is the Hadoop Distributed File System, which enables very large data sets to stored on many disks and retrieved, using parallel methods.

I believe, but do not know for sure, that MapReduce-style operations are the secret to the marvelous Google search engine, which achieves so much for each user in about a second's worth of processing.

But search and the related job of indexing the Web are not the only tasks that Hadoop (and MapReduce) are good for. Hadoop can make the Web more personal by analyzing the activity of individual visitors to Yahoo and then serving them the ads that are most suited to their interests, making such advertising less hit or miss. Relational database, on the other hand, is good for more precise tasks that consume structured data. Hadoop is more the baleen whale of databases, taking in masses of unstructured material at a gulp without too much discrimination.

Hadoop and other cloud software has another important characteristic not found in the enterprise world. If its going to run on a large cluster, then it's going to experience hardware component failures that can't be allowed to bring the whole operation to a grinding halt. So Hadoop doesn't generate one copy of the data but two or three. It recognizes a hardware failure when one occurs, and far from shutting things down, turns to a replicated copy and tells another processor to pick up the workload.

This is how "clouds" on the Internet differ from clusters in the enterprise. They are engineered to tolerate hardware failure in the software. Fault tolerance has been kicked upstairs from the hardware to the software, where it's cheaper to supply if you just throw enough inexpensive hardware at it. In this, cloud software resembles the Internet itself, which was designed to tolerate hardware failure by detecting and routing around it. The Internet keeps running, no matter what. Hadoop is designed to do the same, and future cloud software will share this characteristic.

Who cares? Well, "half the start-ups in the Silicon Valley use Hadoop on Amazon (Amazon's EC2)," said Eric Baldeschweiler, Yahoo's VP of Hadoop software development, Nov. 3 at the Cloud Computing Conference & Expo in Santa Clara. They might teach you more about Hadoop, if they find a way to use it disruptively for competitive advantage against your company.

In writing about Hadoop earlier this week, I cited Yahoo's use of Hadoop and noted that Yahoo makes available for free its tested production version of Hadoop, a boon to mankind and those who wish to use cloud style data analysis. It should also be noted that Yahoo invests in Hadoop's continued development, and gives that development to the Apache open source Hadoop code base. Of roughly 20 committers to the incubator project, 11 work at Yahoo. Yahoo offers no public cloud resource for rent by the hour, as Amazon does with EC2 and Google does with Google AppEngine. Instead, Yahoo is concentrating on building services spawned by an internal, private cloud that is in the process of being built out.

How is the cloud different from predecessor forms of computing? It is an evolutionary outgrowth of them, with potentially revolutionary results. "Cloud is a promise… a long journey," said Surendra Reddy, VP of Yahoo's Integrated Cloud and Virtualization Group. And that journey has just begun.

InformationWeek and Dr. Dobb's have published an in-depth report on how Web application development is moving to online platforms. Download the report here (registration required).

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Pandemic Responses Make Room for More Data Opportunities
Jessica Davis, Senior Editor, Enterprise Apps,  5/4/2021
10 Things Your Artificial Intelligence Initiative Needs to Succeed
Lisa Morgan, Freelance Writer,  4/20/2021
Transformation, Disruption, and Gender Diversity in Tech
Joao-Pierre S. Ruth, Senior Writer,  5/6/2021
White Papers
Register for InformationWeek Newsletters
2021 State of ITOps and SecOps Report
2021 State of ITOps and SecOps Report
This new report from InformationWeek explores what we've learned over the past year, critical trends around ITOps and SecOps, and where leaders are focusing their time and efforts to support a growing digital economy. Download it today!
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll