Hadoop World NYC Highlights Budding Alternative for Big Data - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management
01:20 PM
Connect Directly

Hadoop World NYC Highlights Budding Alternative for Big Data

East Coast event highlights growing, mainstream adoption of open-source software designed for terabyte- to petabyte-scale data processing.

"Our storage footprint tripled between 2007 and 2009... so why wouldn't we consider Hadoop?"

This testimony, shared by Sih Lee of JP Morgan Chase, pretty much sums up the running theme at last week's Hadoop World New York City. We're entering a petabyte era, so organizations of all kinds are looking for new alternatives to handle the 'big data' data processing challenges. (See the influencers and read what they're saying in our accompanying "Hadoop World NYC Image Gallery.")

Hadoop is an open-source software project that was originally based on MapReduce processing principles articulated in a Google white paper published in 2004. The project has since flourished and expanded beyond MapReduce to add subprojects, including the Hadoop Distributed File System (HDFS); Pig data flow language; the HBase distributed, column-oriented database; and the Hive distributed data warehouse.

Web-based companies have led Hadoop adoption, and Yahoo!, Amazon, Facebook and eHarmony executives were on hand at Hadoop World NYC to extol the software's virtues and share details of their deployments. The key point of the event, however, was to highlight and encourage mainstream adoption.

"Hadoop is now everywhere and it's not just for Web companies, it's for all types of companies," stressed Christophe Bisciglia, founder of Cloudera, the Hadoop-focused professional services firm that organized the event.

The testimony of JP Morgan Chase’s Lee helped prove Bisciglia's point about mainstream corporate adoption. Lee, a vice president responsible for "Firmwide Innovation & Shared Services Strategy," said the firm has been exploring Hadoop for more than 18 months. It now has several proof-of-concept projects in the pipeline, seeking cost efficiencies over conventional technologies such as storage area networks, network-attached storage and symmetric multiprocessor hardware.

"Hadoop gives us a cost proposition that is an order of magnitude more cost efficient than some of the competing technologies," he said. "Another driver for considering Hadoop is choice... Having a single-vendor technology lock-in does not help us form a sound strategy overall. The ability to embrace a new technology such as Hadoop gives us another option from which to make sound decisions and choices."

Lee positioned MapReduce and the Hadoop Distributed File System generically as an alternative for petabyte-scale, relatively high-latency data processing, though he declined to detail specific applications at the financial services firm. Offering much more information, Facebook described its Hive-based data warehouse implementation in detail and eHarmony discussed the advantages of cloud-based MapReduce processing in preparation for internal data warehouse analysis.

Cloudera describes Hadoop as a complement to, rather than a replacement of existing systems:

Hadoop is not a database nor does it need to replace any existing data systems you may have. Hadoop augments these systems by offloading the particularly difficult problem of simultaneously ingesting, processing and delivering/exporting large volumes of data so existing systems can focus on what they were designed to do, whether that is serving real-time transactional data or providing interactive business intelligence.

Many Hadoop instances (and certainly most of the largest scale Hadoop instances) are built on homegrown implementations of commodity hardware. A few commercial vendors have embraced Hadoop. Aster Data Systems, for instance, supports both SQL- and Hadoop-based MapReduce, and last week it introduced a connector for separate Hadoop instances (built on Aster or other platforms). Vertica also has a connector for Hadoop-based MapReduce implementations.

Amazon has brought Hadoop-based MapReduce to the cloud through its Elastic MapReduce Web service on EC2, and last week it added support for the Hadoop Hive distributed data warehouse.

Judging by the strong attendance at the event, with some 500 developers and advocates in attendance, it's clear that Hadoop is part of a disruptive wave of technologies emerging for big data problems, and mainframes, conventional storage systems and proprietary data management software will see the brunt of the impact.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
The State of Chatbots: Pandemic Edition
Jessica Davis, Senior Editor, Enterprise Apps,  9/10/2020
Deloitte on Cloud, the Edge, and Enterprise Expectations
Joao-Pierre S. Ruth, Senior Writer,  9/14/2020
Data Science: How the Pandemic Has Affected 10 Popular Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  9/9/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
IT Automation Transforms Network Management
In this special report we will examine the layers of automation and orchestration in IT operations, and how they can provide high availability and greater scale for modern applications and business demands.
Flash Poll