Spotify Embraces Hortonworks, Dumps Cloudera - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management
09:15 AM
Connect Directly

Spotify Embraces Hortonworks, Dumps Cloudera

World's largest music service switches Hadoop distributions to take advantage of Hortonworks Hive improvements, support services.

Spotify, the 24-million-user-strong music service based in Stockholm and London, announced Monday that it's migrating its massive, 690-node Hadoop cluster from Cloudera's software distribution to the Hortonworks Data Platform (HDP) and Hortonworks enterprise support.

Among the largest Hadoop implementations in Europe, Spotify's cluster is used to develop analytics that drive the company's personalized services, such as Spotify Radio. It also drives data-driven analyses for advertisers and partners. For example, Spotify can do listener segmentation to help advertisers place ads. It can also do geospatial analyses of listening patterns to help record labels and artists determine optimal concert locations.

"[Hortonworks'] true open source approach and the work they have done to improve the Apache Hive data warehouse system aligns well with our needs," said Wouter de Bie, team lead for data infrastructure at Spotify, in a statement. "We use Hive extensively for ad-hoc queries and for the analysis of large data sets."

Most Hadoop software distributors have supported the so-called SQL-on-Hadoop movement this year -- Cloudera with Impala, IBM with Big SQL, MapR with Drill, and Pivotal with HAWQ -- but Hortonworks is alone in doing so by focusing on improving Hadoop's existing Hive interface through its Stinger initiative.

[ Want the latest from this up-and-coming vendor? Read Hadoop According To Hortonworks: An Insider's View. ]

Hive relies on behind-the-scenes MapReduce processing, which has a reputation for being slow, but Hortonworks executives insist that the company's design improvements will drive a 100X performance improvement that will yield ad-hoc query results within "a handful of seconds."

"Spotify is undertaking some really innovative work in the data analytics field and realized the need for a deep level of open source Apache Hadoop domain experience and expertise," commented Herb Cunitz, president of Hortonworks, in a statement.

Spotify launched in 2008 and soon thereafter launched a 30-node cluster on Amazon Web Services. The company switched to an on-premises 60-node cluster less than two years ago and was scaled out quickly to today's 690 nodes. The company collects more than 200 gigabytes of compressed user activity data per day and has more than 4 petabytes of capacity in its cluster.

Spotify could not be reached in time to comment on whether it's simply using Cloudera's distribution of open source software or also employing its commercial management software and support services. Spotify is said to have a highly skilled, 12-plus-engineer internal Hadoop team that would seem quite capable of running Hadoop independently. That team developed Luigi, a Python framework for batch data processing, dependency resolution and monitoring of Hadoop that Spotify has since contributed to open source.

"The cultural fit was an important factor in our selection and we have appreciated Hortonworks' relaxed, helpful and open approach," said Wouter de Bie. "We were looking for a true partner relationship and the team at Hortonworks [is] committed to enabling the overall ecosystem."

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
D. Henschen
D. Henschen,
User Rank: Author
9/26/2013 | 8:09:53 PM
re: Spotify Embraces Hortonworks, Dumps Cloudera
I finally got the telling answer from Wouter at Spotify and it's as I suspected:

"We were not using Cloudera's commercial management software or support
beforehand," says Wouter. "Everything was done in-house, but we were
running CDH."

Kind of takes the bite out of "dumps" for Cloudera.
User Rank: Apprentice
9/26/2013 | 8:06:44 PM
re: Spotify Embraces Hortonworks, Dumps Cloudera
"We were not using Cloudera's commercial
management software or support beforehand," says Wouter. "Everything was done in-house, but we
were running CDH."
D. Henschen
D. Henschen,
User Rank: Author
9/17/2013 | 3:54:16 PM
re: Spotify Embraces Hortonworks, Dumps Cloudera
Check out this big presentation from Wouter de Bie on Spotify's implementation and uses of Hadoop I didn't see any mention of Cloudera in the slides, so I suspect it's another of the many enterprises that have been setting up and supporting Hadoop clusters on their own (without benefit of support from the likes of Cloudera or Hortonworks). That's clearly changing now at Spotify with the selection of Hortonworks, but I'm still waiting to hear whether it was actually using proprietary Cloudera management software and/or support services.
2021 Outlook: Tackling Cloud Transformation Choices
Joao-Pierre S. Ruth, Senior Writer,  1/4/2021
Enterprise IT Leaders Face Two Paths to AI
Jessica Davis, Senior Editor, Enterprise Apps,  12/23/2020
10 IT Trends to Watch for in 2021
Cynthia Harvey, Freelance Journalist, InformationWeek,  12/22/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you.
Flash Poll