Commentary
2/13/2014
09:57 AM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail

9 Key Big Data Developments From Strata

We analyze the important news from SAS, Hortonworks, MetaScale, and others at the Strata conference, as big data seeks a productive next chapter.



O'Reilly's Strata 2014 conference is in full swing in Santa Clara, Calif., this week, and show organizers are turning a page with the conspicuous absence of the term "big data" from the major themes and conference tracks. It's another sign that people are ready to go beyond the comic book version of what's happening with data.

"Making Data Work" is the aspirational theme of this year's conference, and the tracks promise a more nuanced novella with topics including "Connected World" (Internet of Things), "Data in Action" (real-world case studies), "Data Science" (skills, techniques, and strategies), "Ethics, Policy, and Privacy" (can we actually do anything about these?), "Design" (data-visualization and interfaces), and "Hadoop and Beyond" (tools and technologies).

Many vendors making announcements at Strata have yet to pick up on the emphasis on productivity over hyperbole. The big-data buzz talk seems to be ladled into press releases in inverse proportion to what can be stated about specific capabilities and, more importantly, named customers citing real-world business benefits. 

[ Watch InformationWeek's Doug Henschen discuss "16 Top Big Data Analytics Platforms" with the editors of AllAnalytics (below). ]

We'll skip the news here, therefore, about venture capital rounds and stealth companies and focus instead on nine more notable announcements from Strata in three categories:

Analytics at Scale

SAS In-Memory Statistics for Hadoop: SAS has progressed from an Access connector to Hadoop to delivering SAS Visual Analytics and SAS High-Performance Analytics products capable of running on Hadoop. The new news this week is SAS In-Memory Statistics For Hadoop, which takes advantage of the vendor's capabilities to perform data analysis on high-scale, in-memory clusters.

SAS In-Memory Statistics For Hadoop, to be released in the first half of this year, will enable multiple users to "simultaneously and interactively manage, explore, and analyze data, build and compare models, and score massive amounts of data in Hadoop." Selected data from Hadoop is loaded into memory once for iterative analysis across multiple users, avoiding time-consuming rounds or writing to and reading from disk.

SAS also promises to eliminate "a patchwork of tools" and "the need for different analytic programming languages," but this hints at a SAS-only world that might not go down well with open-source-minded Hadoop fans. Analysis options are said to include clustering, regression, generalized linear models, analysis of variance, decision trees, random decision forests, text analytics, and recommendation systems. We're anxious to see how open this world might be and how it combines a memory cluster with a Hadoop cluster (or could they possibly be one and the same)?

Alpine Chorus: We gave you a preview of what Alpine Data Lab's new Alpine Chorus product offers in our recent "2014 Analytics, BI, and Information Management Survey." Alpine is calling it "The Sharepoint of Data Science."

The idea behind Chorus is to break down complex, iterative analytics workflows into discrete, understandable steps that can be shared with and controlled by business users. The goal is to eliminate the time-consuming back-and-forth between business users who know what they want and data wonks who were previously the only ones who could deliver results. Havas Media, the beta customer we interviewed in our report, said it gives business users and data analysts a shared workflow and "a common language" for analytic exploration. Chorus can do its distributed "in-cluster" work on top of Hadoop if you choose, avoiding data movement from your high-scale data store.

Next Page



Easy alliances or powerful partnerships?
Alliances are easy to announce, but results are the only proof of powerful partnerships. We'll see if we hear more about these four announcements:

Hortonworks and Red Hat: This partnership is about making it easier to run Hortonwork's Hadoop Data Platform on Red Hat enterprise software and JBoss middleware. Specifically, the companies have released a beta plug-in for running HDP on Red Hat Storage Server to create a general-purpose storage pool with Hadoop, POSIX (Portable Operating System Interface), and OpenStack Object Storage interfaces. Red Hat has also tweaked JBoss Data Virtualization so it can ingest data from HDP.

[ Join InformationWeek's Doug Henschen in a 2/13, 2 pm ET interview on "16 Top Big Data Analytics Platforms." Video will be archived. ]

HP Vertica and its Marketplace: HP can't outmuscle the likes of IBM or Oracle on software, so it's opening up its Autonomy and Vertica assets as platforms on which third-party ISVs, integrators, and consultants can build big (and not so big) data applications. Autonomy's play will include a heavy dose of granular cloud services for "human data" analysis. The plan for HP's high-scale analytic database is the new HP Vertica Marketplace. It's a spot where developers can download the free, three-node Vertica Community Edition and tap connectors, add-ons, plug-ins, and extensions developed through an "innovations incubation" program.

HP didn't mention any prominent partners showing ware in the new marketplace, but it's own incubator projects open for partner development include HP Vertica Distributed R, intended to scale up data analysis based on the R programming language, HP Vertica Pulse, an in-database sentiment analysis tool, and HP Vertica Place, for low-latency analysis of geospatial data.

DataStax and its Partner Network: The short list on this new roster of "solutions, applications, infrastructure, and ecosystem partners" includes Accenture, Google Cloud Platform, GoGrid, HP, Pentaho, and WibiData. DataStax offers software and support for the open source Cassandra database, and it can run search and Hadoop-based analytics on the same cluster. NoSQL rivals charge Cassandra is complex, so it's no surprise the DataStax Partner Network is intended to promote closer-to-finished apps and implementations to put more feet on the street to "sell, implement, service, and support DataStax software and solutions.

GoGrid and friends: GoGrid says it offers the fastest cloud in the west, with a mix of high-powered servers, fast storage choices, and bare-metal deployment options that others can't touch. To put some meat on those bones it announced partnerships this week with Basho, DataStax, Hortonworks, and MongoDB, all of which it says cloud customers will be able to deploy in push-button fashion.

The enemy here is Amazon Web services, and GoGrid cautions would-be big data practitioners not to get locked down into its "proprietary" world of "closed" services including Elastic Map Reduce, DynamoDB, Redshift, and Kinesis. With third-party tools on GoGrid, you can bring the technology on-premises down the road or get into hybrid deployment scenarios. That option also exists with third-party tools on AWS, but you can forget about bare metal and GoGrid's high-performance options.

Next Page



Hadoop and beyond
Technical announcements are inevitable at a big data, er, "data at work" conference. Strata 2014 saw more than a few, but here's a short list:

MetaScale Appliances: In a bit of surprise announcement, MetaScale, the subsidiary of Sears Holdings, announced that it's offering a "line" of branded Hadoop appliances that will run Cloudera, Hortonworks, or another distribution of the customer's choice. Mind you, Cloudera, Hortonworks, and others have hardware partners that offer everything from recommended configurations to single-SKU, software-preinstalled options (as in the case of the Oracle Big Data Appliance running Cloudera, for example).

[ Join InformationWeek's Doug Henschen in a Feb. 13, 2 pm ET interview on "16 Top Big Data Analytics Platforms." Video will be archived. ]

I had it in my mind that MetaScale was a consulting organization aimed at helping big companies exploit their data with help from Hadoop and related tools, technologies, and analytics. Its expertise, developed first at Sears, is particularly relevant to companies paying big bucks for mainframe compute cycles. That impression was formed after spending a day with MetaScale executives in their offices and reporting on achievements at Sears. Okay, that was 16 months ago and executives ranks, and perhaps priorities, have since changed.

Maybe MetaScale wants to get its foot in the door earlier in the process by helping you with the basics of deploying a Hadoop cluster. But we understood this outfit's real value to be delivered higher up in the stack, helping customers to understand how to take advantage of data and reinvent legacy processes with the aid of a big data platform.

Couchbase 2.5: This upgrade of the highly scalable, NoSQL database promises better performance through Rack Awareness for high availability and better security through cross-data-center data encryption. With Couchbase Server 2.5's Rack Awareness, administrators can create logical groupings of Couchbase Server nodes and replica copies of data that are automatically distributed across server nodes on different racks. This ensures that data is secure despite disruptions such as power outages or switch or rack failure, according to Couchbase. Building on existing cross-datacenter replication capabilities, the 2.5 update adds a secure data-encryption option whereby data moving across wide area networks can be transmitted using SSL encryption between datacenters.

InfinDB 4.5: Before we get to the technology news, the company formerly know as Calpont has been renamed InfiniDB. This matches the name of the company's massively parallel processing database management system, which has been made to run in the cloud and on Apache Hadoop as well as MPP clusters.

We were on to the name change when we included "InfiniDB," alphabetically, in our 16 Top Big Data Analytics Platformscollection. The high points of the InfiniDB 4.5 release announced this week include new Hadoop capabilities such as fast bulk loading for HDFS, Apache Sqoop integration with parallel extraction for bi-directional data load/unload. A new InfiniDB Enterprise Manager provides a unified console for monitoring and managing sources and system resources. New REST APIs support integration into various enterprise systems.

InfiniDB joins Pivotal (with HAWQ) in the camp of vendors running relational database engines on Hadoop. MapR and HP Vertica also joined that camp this week in a separate, Strata-related announcement covered Tuesday. The payoff is a fast SQL-on-Hadoop option that's likely to beat Impala and Hive on query speeds, but we have yet to see benchmarks or tests that prove that performance.

InformationWeek 2014 Healthcare IT Priorities Survey: Healthcare providers are under pressure from Meaningful Use Stage 2, ICD-10 implementation, and the transition to new population health/accountable care business models, all of which have big impacts on information technology needs. We'd like to know how your organization is responding. Take the InformationWeek 2014 Healthcare IT Priorities Survey today and be eligible to win a great prize. Survey ends Feb. 14.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Email This  | 
Print  | 
RSS
More Insights
Copyright © 2020 UBM Electronics, A UBM company, All rights reserved. Privacy Policy | Terms of Service