Hortonworks Certifies Spark On YARN, Hadoop - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
02:46 PM
Connect Directly

Hortonworks Certifies Spark On YARN, Hadoop

Hortonworks catches up to Cloudera with YARN-managed implementation of Spark in-memory framework for machine learning on Hadoop.

Hadoop Jobs: 9 Ways To Get Hired
Hadoop Jobs: 9 Ways To Get Hired
(Click image for larger view and slideshow.)

Hortonworks announced Thursday that Apache Spark, a technology quickly gaining interest for in-memory-accelerated machine learning and other analyses on high-scale data, has been certified to run on Apache YARN, the resource- management layer introduced last year with Apache Hadoop 2.0.

With this milestone, Spark is ready to run as a technology preview on the Hortonworks Data Platform (HDP), which is Hortonworks' Hadoop software distribution. A production-certified release is expected by this fall.

This is not the first appearance of Spark on Hadoop. In February, Cloudera introduced support for Spark using its commercial Cloudera Manager software to deploy, manage, and monitor the software. MapR introduced its own Spark deployment in April. Hortonworks stressed that its approach is 100% open source, using YARN (yet Another Resource Negotiator) to manage and monitor Spark components and workloads alongside other systems and analyses running on Hadoop.

[Want more on Apache Spark? Read MapR Brings Spark In-Memory Analysis To Hadoop.]

"Spark is now natively integrated into Hadoop, so its resources -- CPU, memory, and so on -- can be managed along with the other workloads running on a Hadoop cluster," explained Shaun Connolly, Hortonworks' VP corporate strategy, in an interview with InformationWeek. "That's important to get right because Spark is memory- and CPU-intensive, and you don't want to have to have siloed clusters dedicated to running those workloads."

The whole point of Hadoop 2.0 and YARN is to be able to run multiple workloads -- including Accumulo, Hive, MapReduce, Pig, Storm, Solr, and now, Spark -- against the same data sets, Connolly added.

Asked for comment on Hortonworks' announcement, Cloudera sent InformationWeek the following statement:

"Cloudera developers were the key drivers on YARN support for Spark, leveraging our expertise in YARN as well our developer group on Spark. Cloudera Manager is not orthogonal to YARN support and in fact, Cloudera Manager supports Spark on YARN. Additionally, almost all our customer deployments of Spark today are on top of the YARN framework and we have many customers who are running Spark through us."

Concurrent with Hortonwork's announcement, Spark developer and support provider Databricks announced that Hortonworks is an inaugural member of its Certified Spark Distribution program.

"We're committed to ensuring all Spark users have a terrific experience -- and we're thrilled that Hortonworks shares this vision," said Databricks business development executive Arsalan Tavakoli-Shiraji in a statement. "With the designation of Apache Spark as YARN Ready, enterprises can rest assured that Spark can run simultaneously and effectively with other mission-critical applications."

Customers are now free to download and install the HDP 2.1 Tech Preview Component of Apache Spark on the current HDP 2.0 distribution. Hortonworks expects the HDP 2.1 release, which will include Spark, to be certified for production use "within a handful of months," said Connolly. Hortonworks will support Spark along with the other software included in the distribution.

InformationWeek's June Must Reads is a compendium of our best recent coverage of big data. Find out one CIO's take on what's driving big data, key points on platform considerations, why a recent White House report on the topic has earned praise and skepticism, and much more.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
7/4/2014 | 1:49:35 PM
Re: That's nice, but ...
I am from Cloudera and have committed about 50 patches to Spark. Same goes for a few other people here. What are you looking at?
User Rank: Apprentice
7/4/2014 | 1:34:53 PM
Re: That's nice, but ...
None of the three 'race horses' (MapR, Cloudera, Hortonworks) seem to have contributed to Spark development.

UCB and Databricks (Spark is their main focus) seem to have the most commiters. 



Charlie Babcock
Charlie Babcock,
User Rank: Author
6/30/2014 | 9:28:57 PM
Spark keeps Hadoop competitive
It looks like there is a healthy competition between these companies that will do much to keep their respective Hadoops systems competitive. MapR Spark, Hortonworks Spark on Yarn and Cloudera Manager's support for Spark are pushing the boundaries of big data.
User Rank: Apprentice
6/26/2014 | 6:23:02 PM
That's nice, but ...
... would be nicer if more than 0 people from HortonWorks made any contribution to Spark. Or you could actually run Spark in production with HDP.
D. Henschen
D. Henschen,
User Rank: Author
6/26/2014 | 4:25:41 PM
Another case of commercial management tool versus open-source management tool
For more on Cloudera's options for implementing Spark, incuding on YARN, click here. Given Cloudera's use of YARN, the key difference between Cloudera and Hortonworks use of Spark seems to boil down to the management software used for deploying, monitoring, and managing the software (YARN does the workloads). In Cloudera's case it's commerical Cloudera Manager software. In Hortonworks' case it's open source Ambari software, but Ambari support is part of what Hortonworks is still working on at this point. Reading between the lines, I would expect HDP 2.1 to become generally available until this fall.
Study Proposes 5 Primary Traits of Innovation Leaders
Joao-Pierre S. Ruth, Senior Writer,  11/8/2019
Top-Paying U.S. Cities for Data Scientists and Data Analysts
Cynthia Harvey, Freelance Journalist, InformationWeek,  11/5/2019
10 Strategic Technology Trends for 2020
Jessica Davis, Senior Editor, Enterprise Apps,  11/1/2019
White Papers
Register for InformationWeek Newsletters
Current Issue
Getting Started With Emerging Technologies
Looking to help your enterprise IT team ease the stress of putting new/emerging technologies such as AI, machine learning and IoT to work for their organizations? There are a few ways to get off on the right foot. In this report we share some expert advice on how to approach some of these seemingly daunting tech challenges.
Flash Poll