Datameer Bets Visual Analysis Beats SQL On Hadoop - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Software Platforms
09:06 AM
Connect Directly

Datameer Bets Visual Analysis Beats SQL On Hadoop

Datameer 4.0 upgrade advances visual data preparation and analysis on top of Hadoop, offering an easier alternative to SQL.

16 Top Big Data Analytics Platforms
16 Top Big Data Analytics Platforms
(Click image for larger view and slideshow.)

With all the hubbub about SQL-on-Hadoop options over the last year, what seems to be lost on many is the fact that SQL access won't exactly open up big data to the masses. If the arcane languages of MapReduce, Pig, and so on are akin to ancient Latin, adding SQL to the language list is a bit like suggesting the use of modern Greek.

Enter Datameer, which offers a spreadsheet-on-Hadoop approach to exploring and analyzing data on Hadoop. Datameer has been among the pioneers in supporting ad-hoc analysis, reporting, and analytics directly on data in Hadoop, and with a 4.0 version released last week, it has enhanced the product's visual data preparation and data analysis capabilities.

Research by InformationWeek and plenty of other organizations has long confirmed that data quality is one of the biggest obstacles to developing valid and valuable BI and analytics. That challenge hasn't gone away in the big data era, and that's the key reason Datameer 4.0 includes a new Visual Data Profiling feature. Toggling from a spreadsheet view to the "flip side" Visual Data Profiling view, you see details on data type, count, max, min, uniqueness, mean, and average.

[Want more on creating big data applications? Read Hortonworks Adds Cascading For Big Data App Development.]

The data profile also reveals data quality problems, such as having more than two categories for customer gender or having a great big gap in the age range of customers. This is where Datameer's data correction, filtering, transformation, and enrichment capabilities come into play. Once you've addressed data quality problems, you toggle between the spreadsheet and data-profile views to visually inspect and validate that you're working with reliable data.

The second big advance in Datameer 4.0 is a new visual view of Datameer's clustering, column-dependency, decision tree, and recommendation algorithms. Once again, you toggle from the spreadsheet view to a flip-side visual analysis that helps business analysts understand how and why the algorithms produced their results.

The Visual Data Profile view in Datameer 4.0 offers details such as data type, count, max, min, uniqueness, mean, and average.
The Visual Data Profile view in Datameer 4.0 offers details such as data type, count, max, min, uniqueness, mean, and average.

Instead of just seeing that there are 12,500 customers in cluster A and 28,200 in cluster B, for example, the visual view might reveal that cluster A is made up of 20- to 30-year-old customers who spent an average of $48 last month, whereas cluster B is made up of 40- to 50-year-olds who spent an average of $96. You could get to these details in Datameer 3.0, but they weren't immediately exposed in a visual view.

Many traditional BI tools can now peer into Hadoop by way of SQL-on-Hadoop interfaces and connectors, but Datameer's spreadsheet-on-Hadoop approach is much more user friendly, even for those who know SQL, according to Karen Hsu, senior director or product marketing at Datameer.

"IT types might use SQL to segment data, but then they'll jump out and use Datameer for grouping, joining, partitioning, unioning data, and identifying outliers," Hsu said. "You can do all that with SQL, but it's just much easier to do in a spreadsheet."

SQL-on-Hadoop tools will certainly have their place. But given that SQL was a nonstarter in democratizing small data analysis, the emphasis on Hive, Impala, and other SQL-driven tools seems like a half step toward what organizations really want from big data. With its 4.0 release, Datameer is banking on spreadsheets and visualizations as the way to give business analysts a hands-on way to conduct data analysis on top of Hadoop.

You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation (free registration required).

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
4/29/2014 | 12:17:25 PM
Datameer 4.0 is available for download
Thanks for the piece. I checked their site ( and I believe they release version 4.0 for free trial. It is as you said and was fast! 
CIOs Face Decisions on Remote Work for Post-Pandemic Future
Joao-Pierre S. Ruth, Senior Writer,  2/19/2021
11 Ways DevOps Is Evolving
Lisa Morgan, Freelance Writer,  2/18/2021
CRM Trends 2021: How the Pandemic Altered Customer Behavior Forever
Jessica Davis, Senior Editor, Enterprise Apps,  2/18/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you.
Flash Poll