Does Hadoop Have a Speed Problem? - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics

Does Hadoop Have a Speed Problem?

"Fast data" -- information provided in near-real time -- is key to getting the most out of analytics projects, says ParStream CEO Michael Hummel.

If you're familiar with the classic three Vs model of big data -- that's high volume, high velocity, and high variety -- you're also aware that big data means much more than simply stockpiling petabytes of information. The ability to analyze data streams in near-real time is essential, too, and a variety of tools is emerging to fill this need.

One such tool is ParStream, a real-time database for big-data analytics. Its developer, an enterprise software company also named ParStream, is based in Cologne, Germany. It was founded in 2008 by Michael Hummel and Jörg Bienert, the firm's CEO and CTO, respectively. In a phone interview with InformationWeek, Hummel said the big-data marketplace is changing rapidly as organizations begin to see the value of "fast data," the ability to analyze data nearly in real time. He also called Hadoop overhyped, but acknowledged that the open-source software framework has become a de facto standard of sorts for big data.

"Definitely there is a de facto standard, which is called Hadoop. But people do not actually mean Hadoop when they say Hadoop," said Hummel.

How so?

"Hadoop is a multilayered system, and the lowest level of the system is called HDFS: Hadoop Distributed File System. And that's what people mean when they say, 'We store all the data in Hadoop,' they actually mean that they store it on a distributed file system called HDFS. And all the systems use the data stored there, and can access it from there," he said. MapReduce is just one of many options, including ParStream, that can analyze this data, he added.

[ How does big data affect your small business? Read Big Data FAQ: Separating Signal From Noise. ]

"Nowadays, there are solutions that can handle petabytes of data," said Hummel. "We are able to do it. The MapReduce approach was a very, very good first step in that direction. It made it possible." But in the world of big data, as with other emerging technologies, the first solution is usually not the best long-term choice, he said.

"Today, people today talk about speed, they talk about real time. They talk about making data accessible at your fingertips. So we're talking about sub-second response times. But Hadoop was not made for that. It was made for long-running queries that come back after 14 days."

ParStream offers a real-time database for big-data analytics.
ParStream offers a real-time database for big-data analytics.

Not surprisingly, that's a problem for organizations seeking fast-data analysis. Hadoop "was never made for interactive analytics, which is the big thing at the moment," said Hummel. "People and companies see it as absolutely relevant to be able to analyze data in a very, very short time." Fast data, he added, means "that you don't consider only the data from yesterday, but also the data from now."

For instance, fast-data analytics can benefit retail sites, particularly those where it's difficult to react immediately to shoppers' behaviors and make product or service recommendations. "That's a missed opportunity," said Hummel. "Think of people who fill up their shopping baskets, and then don't do anything on that website for five minutes. Perhaps they're not interested anymore, or they're distracted. Maybe they have found something better on a different website."

Fast-data analytics allow businesses to respond more rapidly to their customers and improve the buying experience. Hummel added: "Being able to engage with these people while they are still on the website -- or when they've just left -- makes much more sense than waiting until the next day to send out a reminder to say, 'OK, we'll give you a 5 percent bonus if you buy today.'"

Database administrators are the caretakers of an organization's most precious asset -- its data -- but rarely do they have the experience and skills required to secure that data. Indeed, the goals of DBAs and security pros are often at odds. That gap must be bridged in order for organizations to protect data in an increasingly threat-ridden environment. In the Dark Reading How Enterprises Can Use Big Data To Improve Security report, we examine what DBAs should know about security, as well as recommend how database and security pros can work more effectively together. (Free registration required.)

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
11/22/2013 | 6:19:31 PM
Re: How fast is too fast?
Thomas, I don't think that fast data is the problem here; it's the automated responses that can become problematic. The data speed is simply an enabler.

Wall Street experienced problems because their automated trading systems not only recommend actions but also execute them – all without human interjection.

To obviate any complications, and until these automated systems learn to think about the consequences of their actions, we can rely on the recommendations but should always have decision-makers involved in the execution.
User Rank: Moderator
11/21/2013 | 3:30:33 PM
Re: Big Data
Jeff, HPCC Systems is an open source data-intensive supercomputing platform to process and solve Big Data analytical problems and is extremely good at what it was designed to do - Massive ingestion of semi structured (or unstructured) data, converting to a normalized form (like RDBMS tables) and performing analytics in an easy to use SQL like language (much more powerful than SQL). In addition, it cleanly integrates with Apache Kafka to provide near real time analytics. More info at
User Rank: Strategist
11/21/2013 | 6:56:55 AM
Re: How fast is too fast?
I agree you Samicksha, but for Hadoop fans one thing to be noted is, Hadoop provides no security model, i.e. it cannot detect a man in the middle attack between nodes.
User Rank: Strategist
11/21/2013 | 2:59:57 AM
Re: How fast is too fast?
Impressive part of Hadoop, it is flexible and you can design the cluster size to make task done accordingly.
Thomas Claburn
Thomas Claburn,
User Rank: Author
11/20/2013 | 4:34:46 PM
How fast is too fast?
Given the problems Wall Street has had with automated trades, I wonder whether web sites that rely on rapid-fire analytics data and automated responses will self-optimize too quickly, magnifying problems rather than solving them.
10 Ways to Transition Traditional IT Talent to Cloud Talent
Lisa Morgan, Freelance Writer,  11/23/2020
What Comes Next for the COVID-19 Computing Consortium
Joao-Pierre S. Ruth, Senior Writer,  11/24/2020
Top 10 Data and Analytics Trends for 2021
Jessica Davis, Senior Editor, Enterprise Apps,  11/13/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
Why Chatbots Are So Popular Right Now
In this IT Trend Report, you will learn more about why chatbots are gaining traction within businesses, particularly while a pandemic is impacting the world.
Flash Poll