ClearStory Data Touts Speedy Big Data Analysis - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Big Data Analytics
News
3/30/2015
03:06 PM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
100%
0%

ClearStory Data Touts Speedy Big Data Analysis

ClearStory taps HDFS and Apache Spark in the cloud to let business users blend high-scale, variable data and analyze it with in-memory speed.

Top Jobs For STEM: Big Data, IT Product Management
Top Jobs For STEM: Big Data, IT Product Management
(Click image for larger view and slideshow.)

It combines the scalability and variable-data adaptability of Hadoop, the in-memory analysis speed of Apache Spark, and the agility and usability of a cloud-based tool designed for business analysts.

These are the traits that ClearStory Data promises. With a new release of its cloud service announced on Monday, the company said it's delivering greater control over data-blending and analysis, more types of analyses, and better performance, due to behind-the-scenes integration of the latest (version 1.2) data-processing engine from Apache Spark, the distributed, in-memory analytics platform.

"Previously customers would load their data and use our tool to find correlations using our data-harmonization engine, but it was almost like a black box," said Vaibhav Nivargi, ClearStory's co-founder and chief architect in a phone interview with InformationWeek. "With the new release, we're striking a balance between the simplicity of delivering automated recommendations and giving power users a lot more flexibility and control over how they harmonize data."

[ Want more on in-memory big data analysis? Read Spark Promoter Databricks Should Let Software Shine. ]

When users upload data into the ClearStory service, it's stored in on a Hadoop Distributed File System (HDFS). This infrastructure, which is managed entirely by ClearStory, lets customers blend a variety of high-scale data without predefined data modeling or complex ETL work. The data is then blended, and notable overlaps and correlations exposed after processing in Apache Spark's core in-memory query-optimization engine. Business users work in a ClearStory-developed Storyboard analysis environment rather than using Spark tools such as Spark SQL, MLlib, Spark Streaming, or GraphX.

"Business users who can conceptually understand forecasting, clustering, or segmentation don't want to be burdened with picking algorithms and parameters or creating and serializing models," said Nivargi. "With Storyboards you can do statistical operations, find correlations in data, drill in or out based on attributes in the data set, and you can bring in external data sets and create joins, which we call harmonization."

Storyboards are more flexible than dashboards, according to Nivargi, because they can be changed, adapted, and augmented with new data by business users, whereas dashboard changes often have to be handled by IT staff or power users.

In a retail sales-analysis scenario, business users at a consumer packaged goods company could use ClearStory to blend and analyze disparate data from retailers and third-party sources.

In a retail sales-analysis scenario, business users at a consumer packaged goods company could use ClearStory to blend and analyze disparate data from retailers and third-party sources.

With its combination of graphical data-exploration and data-analysis capabilities, the ClearStory service seems to have much in common with Databricks Cloud, the Spark-based service (currently in beta) offered by the developer and promoter of Apache Spark. Other products that come to mind include Platfora and Datameer, though these are on-premises tools (with the latter having a software-hosting option).

ClearStory is different from the Databrick Cloud because the latter is "something for more sophisticated users, including data scientists, who are comfortable coding in Scala, Spark SQL, or Python," according to Nivargi. And ClearStory doesn't compete with Platfora and Datameer, he said, because those tools are deployed on top of customer-managed Hadoop deployments. ClearStory, in contrast, manages the data infrastructure behind its services in the cloud, and that complexity is not exposed to the customer.

In another differentiator, ClearStory touts data-lineage and data-access controls required by regulated businesses. The new release is said to show the origin of source data and its original structure and shape, even after it's blended into larger data sets exposed and analyzed within ClearStory. Also new in the upgrade is a guided user model designed to enable line-of-business users without deep IT or BI training to access, prepare, blend, and harmonize data.

ClearStory boasts a high-profile list of namable customers including CocaCola, Dannon, DelMonte, and Merck.

Attend Interop Las Vegas, the leading independent technology conference and expo series designed to inspire, inform, and connect the world's IT community. In 2015, look for all new programs, networking opportunities, and classes that will help you set your organization’s IT action plan. It happens April 27 to May 1. Register with Discount Code MPOIWK for $200 off Total Access & Conference Passes.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
asksqn
50%
50%
asksqn,
User Rank: Ninja
3/30/2015 | 4:20:41 PM
Hadoop boop
Anything that speeds up data analysis is a big bonus.  Looking forward to test driving this product.
Slideshows
Reflections on Tech in 2019
James M. Connolly, Editorial Director, InformationWeek and Network Computing,  12/9/2019
Slideshows
What Digital Transformation Is (And Isn't)
Cynthia Harvey, Freelance Journalist, InformationWeek,  12/4/2019
Commentary
Watch Out for New Barriers to Faster Software Development
Lisa Morgan, Freelance Writer,  12/3/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
Slideshows
Flash Poll