EMC Tries To Unify Big Data Analytics - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management
12:19 PM
Connect Directly

EMC Tries To Unify Big Data Analytics

EMC Greenplum Modular Data Computing Appliance puts SQL and Hadoop in the same box, but is it a truly cohesive platform?

8 Big Data Deployments In Detail
(click image for larger view)
Slideshow: 8 Big Data Deployments In Detail
Two separate worlds have emerged in big data analytics, but EMC announced a Greenplum appliance on Wednesday that aims to bring those two separate worlds together.

On the one hand there's structured data that fits neatly into the columns and rows of relational databases. That data has been mastered by relational databases, and even when it gets big (meaning north of about 10 terabytes), there are options such as massively parallel processing supported by products such as EMC's Greenplum database.

On the other hand there's the array of semi-structured, unstructured, and inconsistent data types like server log files, sensor data, social-network comments, and other forms of text-centric information. For that world the Hadoop open-source project has emerged as the leading platform for making such information computable. (Hadoop also handles highly structured data, but mostly as a high-capacity, low-cost data store.)

[Want more on big data deployments? Check out this image gallery on 10 Lessons Learned By Big Data Pioneers.]

With Wednesday's release of the EMC Greenplum Modular Data Computing Appliance (DCA), EMC says it has unified these heretofore separate domains. It's a follow up to the company's announcement last May of Greenplum HD Community and Enterprise distributions of Hadoop software and a promise to deliver a Hadoop appliance.

Greenplum's Community edition includes Hadoop MapReduce, the HDFS distributed file system, the Apache Hive query tool, the HBase column-oriented data store, and ZooKeeper tool for configuring clusters. The Enterprise edition adds proprietary features for snapshotting and replication of Hadoop clusters as well as system management capabilities.

The Modular DCA is one box that can support multiple quarter-rack deployments that can be mixed, matched, and scaled. You can start with a standard Greenplum Database Module for scalable SQL analysis and add a quarter-rack Greenplum HD module for running EMC's Hadoop release.

Other quarter-rack options include the Greenplum Database High Capacity Module, which combines more storage and less compute capacity than a standard module for high-scale, long-term archival storage at a lower cost per terabyte. There's also a Greenplum Data Integration Accelerator (DIA) module designed to host partner applications, like predictive analytics capabilities from SAS, data-integration software from Informatica, and other options said to be in review.

EMC's modular approach lets you scale standard SQL, Hadoop, archival, or analytic application capacity in quarter-rack increments up to a total of six full racks. EMC says its approach will not only save money by eliminating the need for separate hardware platforms, it will also speed insight and minimize storage demands by streaming Hadoop analyses directly into the Greenplum database. In this approach, data doesn't have to be created and stored in one environment and then copied and moved into another.

EMC used the words "coprocessing" and "marriage" to describe the blend of SQL and Hadoop within the modular appliance, but it's not quite that harmonious just yet.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 2
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Becoming a Self-Taught Cybersecurity Pro
Jessica Davis, Senior Editor, Enterprise Apps,  6/9/2021
Ancestry's DevOps Strategy to Control Its CI/CD Pipeline
Joao-Pierre S. Ruth, Senior Writer,  6/4/2021
IT Leadership: 10 Ways to Unleash Enterprise Innovation
Lisa Morgan, Freelance Writer,  6/8/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll