2 Ways To Ease Hadoop Growing Pains - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Software // Information Management
Commentary
2/2/2012
02:07 PM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

2 Ways To Ease Hadoop Growing Pains

EMC Isilon and RainStor address enterprise gaps in the open source Apache Hadoop framework.

Interest in Hadoop is booming, so it should be no surprise that commercial vendors are piling on with products that promise to make the open source big data platform more reliable, more versatile, less expensive (by reducing required hardware investments) or faster.

Enter EMC Isilon and RainStor, both of which say they're plugging gaps in Hadoop to meet enterprise-grade needs. Each vendor brings a new twist to HDFS, Hadoop's distributed file system. EMC Isilon has tied its network-attached storage to HDFS, while RainStor has added a database on top of the file system that promises high compression as well as support for SQL analysis.

With the latest upgrade of its NAS operating system, OneFS 6.5, released this week, EMC's Isilon Systems unit has integrated its NAS architecture with the HDFS protocol. This integration lets customers scale out storage in a distributed fashion, but on an Isilon NAS rather than on the commodity hardware typically used to run HDFS.

[ Want more on Hadoop? Read Hadoop Spurs Big Data Revolution. ]

Isilon's new Hadoop option will be most attractive to customers that already have the vendor's NAS. The combination lets them use the platform for multiple high-scale storage needs. There's no need to create a separate, commodity-hardware-based storage platform just for Hadoop--though customers will still have to have a clustered server environment to provide Hadoop's compute capacity (which is available by way of the EMC Data Computing Appliance).

Another benefit of Isilon's NAS is that it provides enterprise-class data protection capabilities, including snapshots, replication, and backup. This eliminates the single point of failure inherent in Hadoop's NameNode, which is the controlling node of a cluster that contains metadata about the files stored in each data node. Isilon's NAS ensures high availability, and snapshots can be used to rebuild the cluster in the unlikely event of a complete failure.

The Isilon NAS can be used with any Apache Hadoop distribution, including those from Cloudera, Hortonworks, and MapR. But part of the appeal of the vendor's new Hadoop support is one-stop shopping and support from EMC, which also offers the Greenplum HD community distribution of Apache Hadoop.

Somewhat confusingly, EMC also offers the recently renamed Greenplum MR distribution (formerly HD Enterprise Edition), which is based on MapR's distribution of Hadoop. MapR does away with the NameNode problem entirely by replacing HDFS with NFS (the Unix-based Network File System). MapR's proprietary components support high availability and, the vendor maintains, higher scalability and performance than HDFS. EMC bills Greenplum MR as its high-performance distribution, but the name change and new Isilon tie hint that EMC is hedging its bets with Apache and proprietary MapR Hadoop distributions.

Compress For Success

RainStor started working on the big data problem long before the term became fashionable. The eight-year-old company has focused mostly on high-scale archival storage to meet compliance needs. RainStor Big Data Analytics, introduced last month, puts the vendor's database technology on top of HDFS. The promise is high data compression--up to 40x--while supporting both SQL querying and MapReduce processing of that data.

Data compression is a gift that keeps on giving because it reduces storage requirements and cost. RainStor says Hadoop clusters can be 50% to 80% smaller in terms of storage capacity with its technology in place. If you already have a Hadoop cluster, adding RainStor will let you store as much as twice the data (depending on the data type) without adding hardware.

RainStor's database can query data in its compressed format, eliminating the un-compress step and improving performance. The caveat is that the technology is best suited to historical data that doesn't change, not fixed information that's constantly updated (like a customer database). The compression technology relies on value- and pattern-de-duplication techniques, so it's best suited to data that has repeating values or patterns. Log files, clickstreams and call data records fit this description (and are also historical records), but video, image and voice data do not.

More To Come

The EMC and RainStor announcements aren't the first of their kind, and they won't be the last. In November, Cloudera announced support for the NetApp Open Solution for Hadoop, a reference storage architecture based on the storage vendor's hardware. Like EMC Isilon's Hadoop offering, Open Solution decouples storage and compute capacity while promising higher availability and reliability than a conventional deployment.

RainStor's ability to run both SQL and MapReduce is appealing because you don't have to bother with moving large data sets between separate environments, a time-consuming task. Other vendors also offer a single point of access to Hadoop; in the case of Hadapt, for instance, it's all about using SQL, MapReduce, and related analytics from one spot. Compression isn't part of that story.

The Internet giants that pioneered Hadoop were used to its rough edges. Enterprises used to stability and high availability won't be so forgiving. It's a safe bet that more vendors and proprietary enhancements will emerge.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
nigoola
50%
50%
nigoola,
User Rank: Apprentice
2/8/2012 | 5:48:45 AM
re: 2 Ways To Ease Hadoop Growing Pains
ok may be you said is right If you accomplish much, that means swtor gets positive feedback from players. Surely they can pay for the game as well. Positive valuation is an important standard for players when they are choosing a game to play. Swtor is such a game that receives positive valuation from players. So far, EA has sold out more than 2 million game clients and at the same time, swtor has attracted 1.7 million renewal players. This is really a large number.This is swtor4credits,we are the most professional swtor credits and power leveling sale site.
Slideshows
10 Ways to Prepare Your IT Organization for the Next Crisis
Cynthia Harvey, Freelance Journalist, InformationWeek,  5/20/2020
News
IT Spending Forecast: Unfortunately, It's Going to Hurt
Jessica Davis, Senior Editor, Enterprise Apps,  5/15/2020
Commentary
Helping Developers and Enterprises Answer the Skills Dilemma
Joao-Pierre S. Ruth, Senior Writer,  5/19/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Key to Cloud Success: The Right Management
This IT Trend highlights some of the steps IT teams can take to keep their cloud environments running in a safe, efficient manner.
Slideshows
Flash Poll