Actian, HP Vertica Join SQL-On-Hadoop Bandwagon - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
09:49 AM
Connect Directly

Actian, HP Vertica Join SQL-On-Hadoop Bandwagon

Actian and HP Vertica separately challenge Cloudera Impala, follow Pivotal in adapting their databases to run on the big data platform.

10 Big Data Pros To Follow On Twitter
10 Big Data Pros To Follow On Twitter
(Click image for larger view and slideshow.)

Actian on Tuesday joined the long list of companies that have introduced a way to support SQL access and querying on top of Hadoop. The announcement comes just a week after HP upgraded SQL-on-Hadoop functionality it introduced late last year through its Vertica database.

Actian and HP join Pivotal (with Greenplum-based HAWQ) and InfiniDB among companies extending existing relational database management systems to run on top of Hadoop's HDFS file system. Actian said it's going after Hadoop market-share leader Cloudera and its Impala offering, which was introduced last year as a faster, more SQL-compliant alternative to Hive.

[Want more on Pivotal's analysis options on Hadoop? Read Pivotal Subscription Points To Real Value In Big Data.]

The Actian Analytics Platform Hadoop SQL Edition, due out by the end of this month, beats Impala with even faster querying and ISO SQL 92 compliance, according to Actian CTO Mike Hoskins.

"We're offering full-functioning, SQL-complete functionality running natively on Hadoop, and we're also the highest-performing SQL database running on Hadoop," Hoskins told InformationWeek in a phone interview. "If you add those two together, we have an advantage that's hugely important for customers looking to empower their SQL users."

Actian internal research claims faster querying than Cloudera Impala.
Actian internal research claims faster querying than Cloudera Impala.

Actian has acquired and consolidated into its Actian Analytics Platform technologies including the ParAccel and Vectorwise databases and Pervasive DataRush data-integration software. The new SQL-on-Hadoop option uses what's now called the Vector engine for parallelized querying on HDFS. Actian's testing shows its query performance will be as much as 30 times faster than Impala, Hoskins said.

HP introduced SQL-on-Hadoop capabilities on its columnar Vertica database late last year by eliminating its proprietary storage layer so it could work with Hadoop-native file formats including JSON, Parquet, Thrift, and others. In last week's release, dubbed Dragline, HP eliminated all separation between Hadoop and Vertica clusters.

"That means Vertica can coexist with the Hadoop cluster, and we can access and query against HDFS data leaving it where it is," said Eamon O'Neill, HP's Vertica product manager in a phone interview with InformationWeek. Vertica is also capable of doing SQL queries against semi-structured data including clickstreams and Web session data, according to O'Neil.

Actian's architecture does not require a separate cluster, but it appears to be a step behind HP in that it has to load new data or convert existing data inside Hadoop into its proprietary database storage format to support SQL querying. Actian says support for Hadoop-native file formats are on the roadmap for a future release.

There's more to the Actian and HP announcements. Actian, for example, boasts 200 connectors to enterprise data systems and YARN-certified data processing and ETL on top of Hadoop. HP enhanced Vertica with live aggregate lookups for enhanced customer personalization analysis, sentiment analysis against short text streams such as Twitter tweets, and improved workload-management features. But the big news for both companies is clearly SQL-on-Hadoop support.

Despite the profusion of options for using SQL against big data, Hive remains the most widely used query tool with Hadoop. On that front Hortonworks says the latest generation of Hive offers greatly improved performance. Nonetheless, Hive and Impala both fall short of relational databases in SQL functionality, according to Forrester analyst Mike Gualtieri.

"Vendors have obsessed about performance, but the question is, can you run the queries you need to run?" Gualtieri told InformationWeek. "Impala still has work to do, but Actian, Pivotal, and Vertica are far more likely to support the queries that companies already have in use."

IBM, Microsoft, Oracle, and SAP are fighting to become your in-memory technology provider. Do you really need the speed? Get the digital In-Memory Databases issue of InformationWeek today.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Li Tan
Li Tan,
User Rank: Ninja
6/4/2014 | 4:40:08 AM
Re: SQL is one thing, not everything.
SQL is important but definitely not everything, even not the core in big data era. The real purpose of big data is its analysis capability. We need to coorelate unstrctured data and draw meaningful conclusion from it. So SQL is a method, facility but not the goal by itself.
D. Henschen
D. Henschen,
User Rank: Author
6/3/2014 | 12:33:22 PM
SQL is one thing, not everything.
SQL is important, and that's why there have been so many announcements, but remember that the first and highest purpose for Hadoop is not to be an alternative platform for the same old structured data anlayses. Hadoop's higher use is correlating structured and unstructured data and finding new insights in variable data such as clickstreams, log files, mobile data, social data and more. YARN will enable multiple modes of analysis. Spark, for example, is aspiring to support machine learning, streaming analysis, SQL and other ways of analyzing data. So SQL is one thing, but not everything.
IT Careers: 10 Industries with Job Openings Right Now
Cynthia Harvey, Freelance Journalist, InformationWeek,  5/27/2020
How 5G Rollout May Benefit Businesses More than Consumers
Joao-Pierre S. Ruth, Senior Writer,  5/21/2020
IT Leadership in Education: Getting Online School Right
Jessica Davis, Senior Editor, Enterprise Apps,  5/20/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
Key to Cloud Success: The Right Management
This IT Trend highlights some of the steps IT teams can take to keep their cloud environments running in a safe, efficient manner.
Flash Poll