Here's What's Different About 'The Cloud' - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Cloud
Commentary
11/10/2009
06:46 PM
Charles Babcock
Charles Babcock
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Here's What's Different About 'The Cloud'

What's different about cloud computing versus the forms of computing that have gone before? It's really just a matter of scale, isn't it? The Google or Amazon.com or eBay data centers are maybe a little bigger than a big enterprise data center, right? Wrong. One answer lies in an example like Hadoop.

What's different about cloud computing versus the forms of computing that have gone before? It's really just a matter of scale, isn't it? The Google or Amazon.com or eBay data centers are maybe a little bigger than a big enterprise data center, right? Wrong. One answer lies in an example like Hadoop.Hadoop is an Apache incubator project that is likely to soon move to full fledged, open source project status. It has all the qualities of good cloud software, as opposed to large scale enterprise software. It's designed to work on a large server cluster of x86 instruction set machines. Such clusters exist in enterprises everywhere but few scale to the 25,000 at Yahoo that are running instances of Hadoop.

Google probably has even larger clusters running one of the building blocks of Hadoop, called MapReduce, which originated inside of Google. MapReduce knows where the data you're about to analyze is coming from--which disk drives--and it connects that understanding to a map of the processors available, assigning data processing to the processor closest to its point of origin. This allows a lot of data to flow off of hundreds or thousands of disk drives at a time, hit an analysis point very quickly and produce results, which are aggregated into some grand result, such as the answer to a search query.

The data would have been put on the disks in the same manner, striped across many drives so that a single data set could be found by extracting 64 megabyte chunks off each of 1,000 disks. In effect, analysis of 64 gigabytes takes only slightly longer than analysis of 64 megabytes. That's cloud computing. It builds out a large cluster of highly similar servers, manages them as a unit, operates them as a parallel machine, exploiting distributed memory, distributed processing and distributed storage, and achieves big results -- using low cost parts.

The other part of Hadoop is the Hadoop Distributed File System, which enables very large data sets to stored on many disks and retrieved, using parallel methods.

I believe, but do not know for sure, that MapReduce-style operations are the secret to the marvelous Google search engine, which achieves so much for each user in about a second's worth of processing.

But search and the related job of indexing the Web are not the only tasks that Hadoop (and MapReduce) are good for. Hadoop can make the Web more personal by analyzing the activity of individual visitors to Yahoo and then serving them the ads that are most suited to their interests, making such advertising less hit or miss. Relational database, on the other hand, is good for more precise tasks that consume structured data. Hadoop is more the baleen whale of databases, taking in masses of unstructured material at a gulp without too much discrimination.

Hadoop and other cloud software has another important characteristic not found in the enterprise world. If its going to run on a large cluster, then it's going to experience hardware component failures that can't be allowed to bring the whole operation to a grinding halt. So Hadoop doesn't generate one copy of the data but two or three. It recognizes a hardware failure when one occurs, and far from shutting things down, turns to a replicated copy and tells another processor to pick up the workload.

This is how "clouds" on the Internet differ from clusters in the enterprise. They are engineered to tolerate hardware failure in the software. Fault tolerance has been kicked upstairs from the hardware to the software, where it's cheaper to supply if you just throw enough inexpensive hardware at it. In this, cloud software resembles the Internet itself, which was designed to tolerate hardware failure by detecting and routing around it. The Internet keeps running, no matter what. Hadoop is designed to do the same, and future cloud software will share this characteristic.

Who cares? Well, "half the start-ups in the Silicon Valley use Hadoop on Amazon (Amazon's EC2)," said Eric Baldeschweiler, Yahoo's VP of Hadoop software development, Nov. 3 at the Cloud Computing Conference & Expo in Santa Clara. They might teach you more about Hadoop, if they find a way to use it disruptively for competitive advantage against your company.

In writing about Hadoop earlier this week, I cited Yahoo's use of Hadoop and noted that Yahoo makes available for free its tested production version of Hadoop, a boon to mankind and those who wish to use cloud style data analysis. It should also be noted that Yahoo invests in Hadoop's continued development, and gives that development to the Apache open source Hadoop code base. Of roughly 20 committers to the incubator project, 11 work at Yahoo. Yahoo offers no public cloud resource for rent by the hour, as Amazon does with EC2 and Google does with Google AppEngine. Instead, Yahoo is concentrating on building services spawned by an internal, private cloud that is in the process of being built out.

How is the cloud different from predecessor forms of computing? It is an evolutionary outgrowth of them, with potentially revolutionary results. "Cloud is a promise… a long journey," said Surendra Reddy, VP of Yahoo's Integrated Cloud and Virtualization Group. And that journey has just begun.



InformationWeek and Dr. Dobb's have published an in-depth report on how Web application development is moving to online platforms. Download the report here (registration required).

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Commentary
2021 Outlook: Tackling Cloud Transformation Choices
Joao-Pierre S. Ruth, Senior Writer,  1/4/2021
News
Enterprise IT Leaders Face Two Paths to AI
Jessica Davis, Senior Editor, Enterprise Apps,  12/23/2020
Slideshows
10 IT Trends to Watch for in 2021
Cynthia Harvey, Freelance Journalist, InformationWeek,  12/22/2020
White Papers
Register for InformationWeek Newsletters
The State of Cloud Computing - Fall 2020
The State of Cloud Computing - Fall 2020
Download this report to compare how cloud usage and spending patterns have changed in 2020, and how respondents think they'll evolve over the next two years.
Video
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you.
Slideshows
Flash Poll