Real-Time Analytics: Ready For Its Close-Up? - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
11:55 AM
Connect Directly

Real-Time Analytics: Ready For Its Close-Up?

Continuous, real-time analysis based on stream processing could be next the big thing in big data.

16 Top Big Data Analytics Platforms
16 Top Big Data Analytics Platforms
(Click image for larger view and slideshow.)

One of the knocks against Apache Hadoop has been that it was built as a batch processing system and hence is no good for real-time data analytics. Hadoop 2.0 promises a lot of improvement in this area, however. Its YARN resource management layer, for instance, offers better support for stream-processing platforms such as Storm, which recently became an Apache open-source project. Hadoop's shortcomings have also created an opportunity for stream-processing technology providers, which have been busy partnering up with Hadoop vendors.

A growing number of companies are entering the real-time, stream-processing space, including Vitria, a 20-year-old Silicon Valley firm. According to Vitria co-founder and chief technical officer Dr. Dale Skeen, the market for continuous, real-time analysis is quickly evolving from "visionary" early adopters to more mainstream use.

"We're seeing the transition into what I would call the early majority market of this new technique," Dr. Skeen said in a phone interview with InformationWeek.

Skeen knows the big-data market well. He cofounded Vitria with Dr. JoMei Chang in 1994, and has more than 20 years of experience in building large-scale distributed computing and database systems. Prior to starting Vitria, Skeen cofounded Tibco Software, an infrastructure software provider, and has held faculty positions at University of California, Berkeley, and Cornell University.

[How can data visualization tools change business conversations? See Big Data Is Nothing If Not Visual.]

There's an important distinction -- one often misunderstood -- between continuous, real-time streaming analytics and other types of operational intelligence tools that offer "on-demand, near real-time" analytics built more for forensic analysis, Skeen said.

"We build a real-time operational intelligence platform," said Skeen. "We're talking about a type that is continuously monitoring based on streaming analytics. It's constantly assessing the situation and immediately taking action if something goes awry -- or if an opportunity presents itself."

By comparison, the on-demand, near-real-time approach has a different set of attributes.

"It's very valid technology, but it's mainly used for investigations," said Skeen. "With on-demand, you have to ask the right question at the right time... and then you get the answer back."

The on-demand approach requires the user to ask the right questions at the right time, he added. If you miss a significant event, you may miss an opportunity to correct a critical issue or take advantage of a business opportunity.

"Then you're flying blind, and that's the big drawback with on-demand," Skeen claimed.

The continuous real-time approach, however, is always monitoring.

"The moment something interesting happens, where there's an opportunity to sell more to a customer, or there's a threat -- a bad guy is trying to break into your system or get money -- you immediately detect that and can take action," Skeen noted.

He added: "Everyone talks about actionable intelligence, well, we have real-time intelligence with action. You can completely automate some of these actions with business processes or rules... or make human-guided workflow."

In industries such as banking, these are critical scenarios where minutes, seconds, or even milliseconds matter.

"Fraud, for example -- dispensing cash at an ATM," said Skeen. "Would you rather discover it after the fact and investigate why it happened and why you dispensed that cash in that situation? Or would you rather discover it while it's happening and perhaps be able to shut it down?"

Vitria's customers include European mobile carrier O2, which runs the company's stream-processing platform for spam and fraud detection, as well as for customer service.

Having a wealth of data is a good thing -- if you can make sense of it. Most companies are challenged with aggregating and analyzing the plethora of data being generated by their security applications and devices. This Dark Reading report, How Existing Security Data Can Help ID Potential Attacks, recommends how to effectively leverage security data in order to make informed decisions and spot areas of vulnerability. (Free registration required.)

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
2/10/2014 | 5:32:48 PM
Real-Time Analytics
Hadoop opened people's eyes to the value found in large pools of data, but the delay imposed by batch processing dilutes that value considerably. Real-time (or near real-time) analytics are far more valuable. Stream processors (like Twitter Storm, Linkedin Samza, Yahoo S4, Amazon Kinesis, Microsoft StreamInsight, etc.) provide the ability to process/filter/count/transform the data in real-time but the window of data visibility is limited and fleeting. Think of it as looking out the side window of a car doing 100MPH. You can count things and do limited "processing" but only in looking back do you get the true lay of the land. Stream processors require persistence, to enable historical analytics. The problem is: (a) in-memory systems are far to expensive given the huge data volume; and (b) disk-based systems cannot keep up with the flood of fast data, causing an impedance mismatch. Further, like Hadoop, in order to spur corporate adoption it requires SQL support. Hadoop+YARN is interesting, but the underlying large grain file system is incompatible to smaller "block-sized" data found in streams.




10 Cyberattacks on the Rise During the Pandemic
Cynthia Harvey, Freelance Journalist, InformationWeek,  6/24/2020
IT Trade Shows Go Virtual: Your 2020 List of Events
Jessica Davis, Senior Editor, Enterprise Apps,  5/29/2020
Study: Cloud Migration Gaining Momentum
John Edwards, Technology Journalist & Author,  6/22/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
Key to Cloud Success: The Right Management
This IT Trend highlights some of the steps IT teams can take to keep their cloud environments running in a safe, efficient manner.
Flash Poll