HP Taps Vertica For SQL On Hadoop - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Software Platforms
08:46 AM
Connect Directly

HP Taps Vertica For SQL On Hadoop

HP brings fast, familiar SQL querying to Hadoop using Vertica database. Here's how it stands out from other big data analysis options.

10 Hadoop Hardware Leaders
10 Hadoop Hardware Leaders
(Click image for larger view and slideshow.)

Add HP to the growing list of vendors offering SQL analysis options on top of the leading big data platform, as the company on Monday announced the general availability of HP Vertica for SQL on Hadoop. 

In the works for months and partially exposed this summer through an earlier Vertica release, HP Vertica for SQL on Hadoop promises what other tools in this class promise: fast and familiar SQL-based querying on top of the increasingly popular big data store. It stands apart in two ways, according to the company. First, it offers more complete SQL functionality than Hadoop-native options such as Cloudera's Impala project, Apache Drill, and IBM Big SQL, HP executives asserted. 

"We have a SQL query engine that's proven and that has a rich set of analytic capabilities," said Steve Sarsfield, product marketing manager in HP's big data business group, in a phone interview with InformationWeek.

[Want more on Hadoop-native SQL? Read Cloudera Boosts Hadoop App Development On Impala.]

SQL capabilities such as joins and merges are often lacking in "immature" Hadoop-native products, according to Sarsfield, and he added that HP customers report that they are "constantly running into bugs and stability issues with some of those products," though he declined to be specific about which products are buggy.

As for other relational databases that have been ported to run on top of Hadoop, such as Pivotal's HAWQ, based on the Greenplum database management system, or the Actian Analytics Platform SQL Hadoop Edition, based on Vectorwise, HP executives claimed that HP Vertica for SQL on Hadoop offers superior scalability and performance.

"Some of the customers that we've announced, like the Facebooks of the world, have done thorough evaluations of all the technologies available, and we get chosen by the largest and most demanding customers," said Jeff Healey, director of product marketing, HP Big Data platform.

HP claims more than 100 customers are working with Vertica 7.1, the summer release that first exposed SQL-on-Hadoop functionality. But only one customer, human resources firm Snagajob, was quoted in HP's press release about those capabilities. 

"With up to 25,000 job postings updates, over 400,000 active postings, and over one million unique visitors on our site every day, there is tremendous potential insight in all that data," said Robert Fehrmann, data architect at Snagajob, in the statement. "HP Vertica for SQL on Hadoop ... gives us an incredibly robust analytics tool to help understand and act on our information assets." 

HP's superiority claims aside, Vertica for SQL on Hadoop attractions include distribution-agnostic compatibility with Apache Hadoop, Cloudera, Hortonworks, or MapR deployments. The release also supports Hadoop-native file formats including Parquet and ORC. And HP says its per-node pricing model is "highly competitive," though it declined to release pricing details.

HP utilities let you manage Vertica's use of nodes, memory, and compute capacity, while the Hadoop cluster is managed with separate tools.
HP utilities let you manage Vertica's use of nodes, memory, and compute capacity, while the Hadoop cluster is managed with separate tools.

Where Hadoop-native SQL-On-Hadoop options like Hive, Impala, and Drill rely on Hadoop 2.0's YARN resource management and Hadoop-native security and data-governance systems, Vertica (like Pivotal HAWQ) does not run on YARN and has its own administrative and security controls. Thus, you'll have to be careful about HP Vertica (or HAWQ) use of cluster compute and memory resources that could impinge on other workloads and service level. [Author's note: This article was corrected to reflect that the Actian Analytics Platform SQL Hadoop Edition is certified to run on YARN.]

"We are aware that YARN is important and that we need to take a look at it in the future, but for now it's colocated with the Hadoop cluster and you use our utilities to set aside nodes, memory, and the resources you need for Vertica analytics," said Healey of HP. 

Microsoft, Oracle, and Teradata have all stopped short of porting their databases to run on top of Hadoop. Instead they've offered Microsoft Polybase, Oracle Big Data SQL, and Teradata Query Grid to blend analysis of Hadoop data with information in database deployments. 

With their SQL-on-Hadoop offerings, HP, Pivotal, Actian, and others are betting that the data lake/data hub concept of using Hadoop as the epicenter of data management will take hold. You could call that aggressive and forward-thinking, but then, HP, Pivotal, Actian, and others are market challengers with far fewer deployments to defend than incumbents such as Oracle, Microsoft, and Teradata. The bet on Hadoop is a bet that disruption will open up opportunities. 

But big data demands more than just SQL analysis, because it involves data that can't be organized into columns and rows. Pivotal, for example, is touting MADlib for machine learning and statistical analysis, while Actian recently added a graph analysis engine. Apache Spark, a fast-growing in-memory analysis engine that runs on top of Hadoop, supports machine learning, streaming analysis, graph analysis, and R analytics as well as SQL querying. 

HP executives said the Vertica community is experimenting with software that will run open source R analytics on the distributed database, but the vendor itself has no public roadmap to productize and support that software. As for working with unstructured data, executives said HP's Autonomy IDOL software offers options including text-, document-, sentiment-, and image-analysis capabilities, though it's not clear how Vertica and IDOL might work together. 

HP Vertica for SQL on Hadoop will clearly be of interest to any Vertica customer. But the real test of success will be its selection and use in place of Hadoop-native SQL-on-Hadoop options or Hadoop-to-database connections offered by the likes of Oracle, Microsoft, and Teradata.

Apply now for the 2015 InformationWeek Elite 100, which recognizes the most innovative users of technology to advance a company's business goals. Winners will be recognized at the InformationWeek Conference, April 27-28, 2015, at the Mandalay Bay in Las Vegas. Application period ends Jan. 16, 2015.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Charlie Babcock
Charlie Babcock,
User Rank: Author
11/17/2014 | 4:08:45 PM
Column-oriented database on top of Hadoop: How does that work for user?
Vertica, last I knew, was strictly a column-oriented SQl system. Aren't there both advantages and drawbacks to that, on top of Hadoop? If you're looking for big picture data, you can get it quickly via columnar access. If you're looking for details in the data, that's a different story.
D. Henschen
D. Henschen,
User Rank: Author
11/17/2014 | 9:44:10 AM
IBM surprisingly quiet on big data access and analysis
IBM has offered IBM Big SQL as a native SQL-on-Hadoop option, but I haven't heard much about it other than IBM statements. They're also surprisingly quite on options to synthesize/correlate Hadoop data with what's in DB2 and Netezza data warehouses. Anybody at IBM care to share highlights on SQL analysis of Hadoop data options other than Big SQL?
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Becoming a Self-Taught Cybersecurity Pro
Jessica Davis, Senior Editor, Enterprise Apps,  6/9/2021
Ancestry's DevOps Strategy to Control Its CI/CD Pipeline
Joao-Pierre S. Ruth, Senior Writer,  6/4/2021
IT Leadership: 10 Ways to Unleash Enterprise Innovation
Lisa Morgan, Freelance Writer,  6/8/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll