Big Data Analytics: Time For New Tools - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
08:36 AM
Doug Henschen
Doug Henschen
Connect Directly

Big Data Analytics: Time For New Tools

So you're considering Hadoop as a big data platform. You'll probably need some new analytics and business intelligence tools if you're going to wring fresh insights out of your data.

Hadoop is steadily gaining adoption as an enterprise platform for capturing high-scale and highly variable data that's not easy or economically viable to store in relational databases. What's less clear is just how companies are going to analyze all this data.

A recent Forrester report declared that Hadoop is "no longer optional" for large enterprises. Our data suggests that train hasn't left the station just yet: Just 4% of companies use Hadoop extensively, while 18% say they use it on a limited basis, according to our just-released 2015 InformationWeek Analytics, Business Intelligence, and Information Management Survey. That is up from the 3% reporting extensive use and 12% reporting limited use of Hadoop in our survey last year. Another 20% plan to use Hadoop, though that still leaves 58% with no plans to use it.

But there's no doubt that interest in Hadoop is rising. The top draw is the platform's "ability to store and process semi-structured, unstructured, and variable data," cited by 31% of the 374 respondents to our survey involved with information management technology. Another 30% cited Hadoop's ability to handle "massive volumes of data," while 25% said it's Hadoop's "lower hardware and storage scaling costs" as compared to conventional relational database management systems.

That's the IT, data-management perspective on the need for Hadoop. But why is the business looking to capture and analyze big data in the first place? The top driver, cited by 48% of respondents using or planning to deploy data analytics, BI, or statistical analysis software, is finding correlations across multiple, disparate data sources, like Internet clickstreams, geospatial data, and customer-transaction data. Next in line are predicting customer behavior, cited by 46%, and predicting product or service sales, cited by 40% of respondents (multiple responses allowed, see chart below). Other motivations include predicting fraud and financial risks, analyzing social network comments for customer sentiment, and identifying security risks.

In each of these examples, companies are searching for insight by analyzing big data sets that they couldn't discover parsing the same old data they've long held in transactional systems alone. Capturing and analyzing clickstreams, server log files, social network streams, and geospatial data from mobile apps is a recent, big-data-era phenomenon for most organizations attempting it, and they're gaining insights and seeing correlations that just weren't available in the enterprise data warehouse.

But pulling insight out of this new data will require some new tools, ones that work alongside Hadoop -- which is, at its core, nothing more than a highly distributed file system. Here are the three categories of options associated with Hadoop, along with product examples.

Hadoop-native data-processing and analysis options: These include Apache Hive (provides SQL-like data access -- think data warehousing meets Hadoop); Apache Mahout (supports machine learning on top of Hadoop -- think finding patterns in data); Apache MapReduce (for searching, filtering, sorting, and forms of processing large data sets in Hadoop -- ways to boil down really big data to find the useful nuggets); and Apache Pig (a language for writing MapReduce jobs).

Alternative SQL access/analysis options: Hive is slow by relational database standards, and it doesn't support all SQL-analysis capabilities. These alternatives are designed to make BI professionals feel more at home, giving them accustomed performance, SQL- or SQL-like querying, and compatibility with current BI tools. Examples include Actian Analytics Platform SQL Hadoop Edition, Apache Drill, Cloudera Impala, HP Vertica For SQL on Hadoop, IBM Big SQL, Microsoft SQL Server Polybase, Oracle Big Data SQL, Pivotal HAWQ, and Teradata Query Grid.

Analytics and BI options designed to run on Hadoop: These tools blend SQL and BI-type querying with big-data-oriented and advanced analytics capabilities. Examples include Apache Spark, Apache Storm, Datameer, Platfora, and SAS Visual Analytics. Many of these analysis engines now run on Hadoop 2.0's YARN resource-management system.

The first thing to note is that the SQL and SQL-like options -- including Hive, Impala, Drill, the various relational databases ported to run on Hadoop (Actian, HP, Pivotal), and the various SQL-access options (Microsoft, Oracle, Teradata) -- give you the basics of SQL query and analysis, but these are not alternatives to analytics workbenches or business intelligence suites. As noted, a key point of these query and access tools is making Hadoop compatible with incumbent SQL-connected products like BusinessObjects, Cognos, MicroStrategy, OBIEE, Tableau Software, and so on.

Businesses are demanding compatibility with tools that they already have on hand. This helps explain why there were so many SQL-on-Hadoop announcements from both Hadoop vendors (Cloudera, Hortonworks, MapR) and database incumbents (Actian, Hewlett-Packard, Oracle) over the last year.

But companies need more than SQL. The value in big data analysis is often in finding correlations among disparate data sets or insights hidden in semi-structured or highly variable data sources, such as

Next Page

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 3
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
12/18/2014 | 11:27:10 AM
Data analytic is a powerful fraud prevention tool
Big data is useful when it comes to detecting and preventing fraud. Data analytics should be a key weapon in every company's fraud protection arsenal and can strengthen your internal controls. I work for McGladrey and there's a very informative whitepaper on our website that readers of this article will be interested.
Ulf Mattsson
Ulf Mattsson,
User Rank: Strategist
12/18/2014 | 10:00:51 AM
I agree that it is “Time For New Tools.” What about security?
I recently read the Gartner Report "Big Data Needs a Data-Centric Security Focus" concluding "In order to avoid security chaos, Chief Information Security Officers (CISOs) need to approach big data through a data-centric approach" and "Analysts reckon that over 80 percent of organizations will fail to initiate a consolidated data security policy across all their data silos by 2016, resulting in potential non-compliance, security breaches and financial liabilities.

The report suggests that new data-centric audit and protection solutions and management approaches are required. I also noted that companies are starting to follow these guidelines. Hortonworks recently released these types of features, including data tokenization, advanced HDFS Encryption, key management and auditing.

I think it is time to re-think our security approach and be more data-centric.

Ulf Mattsson, CTO Protegrity
D. Henschen
D. Henschen,
User Rank: Author
12/18/2014 | 9:14:33 AM
We need Analytics on Hadoop as much or more than SQL on Hadoop
SQL on Hadoop = training wheels for big data analysis. Tools supporting machine learning, advanced analytics, data visualization, etc. on top of Hadoop are what's needed to make sense of high-volume and highly variable new data types. Apache Spark, Datameer, Platfora, SAS Visual Analysis, Alpine, Revolution Analytics and others are among the emerging options. Even Oracle recognizes that SQL isn't enough. Oracle offers Oracle Big Data Discovery, which starts with machine learning for data exploration and leads to various big data visualization and analysis options.
CIOs Face Decisions on Remote Work for Post-Pandemic Future
Joao-Pierre S. Ruth, Senior Writer,  2/19/2021
11 Ways DevOps Is Evolving
Lisa Morgan, Freelance Writer,  2/18/2021
CRM Trends 2021: How the Pandemic Altered Customer Behavior Forever
Jessica Davis, Senior Editor, Enterprise Apps,  2/18/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you.
Flash Poll