Cloudera Boosts Hadoop Portfolio With Security, Data Update Offerings - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Software Platforms

Cloudera Boosts Hadoop Portfolio With Security, Data Update Offerings

Cloudera is filling the gaps in its Hadoop portfolio with two new products. RecordService provides security management across multiple Hadoop data access apps, while Kudu combines fast analytics and data updates, slims workloads.

8 Smart Cities: A Peek At Our Connected Future
8 Smart Cities: A Peek At Our Connected Future
(Click image for larger view and slideshow.)

No longer just a place to keep big data, Hadoop is growing into a dynamic platform. Now, Cloudera is looking to keep it growing by provided two new pieces that address security and data updates.

On Monday, Sept. 28, Cloudera unveiled RecordService, which allows for singular security management across multiple Hadoop data access apps. In addition, the company detailed a second product called Kudu, which helps combine fast analytics and data updates.

Kudu and RecordService are currently in beta. They are being offered for free as open source apps, and are to be donated to the Apache Software Foundation eventually.

Kudu is a high-speed storage engine that bridges HBase (an open source, non-relational database) and HDFS (Hadoop Distributed File System). "Kudu is the culmination of a three-year R&D effort," said Matt Brandwein, director of product marketing at Cloudera.

(Image: Danil Melekhin/iStockphoto)

(Image: Danil Melekhin/iStockphoto)

Without Kudu, HBase and HDFS are hobbled by limitations. HDFS cannot change data once it is written, though it can append data to files. Updating means deleting and re-adding the files, Brandwein said. HBase is designed for rapid updating, but "it's not good for analytics."

Kudu "enables the combination of updating and analytics," he said. It also simplifies Hadoop architecture by reducing two workloads down to one, while still keeping the strengths of HDFS (storage) and HBase (building online applications). Bridging these two will permit the construction of a real-time online dashboard.

RecordService provides consistent security management across different data access apps, like Spark, Hive, and Impala. The challenge is that each has its own set of security guarantees when used without RecordService. Impala and Hive require control of "fine-grained data," while Spark gets by on coarser data security over rows and columns, Brandwein explained.

To solve this challenge, RecordService "sits between storage in Hadoop and accesses all engines in Hadoop." It brokers data requests, looking up permissions in Apache Sentry and presenting only the data the user is allowed to see. "In effect, it brings universal access control and enforcement to the system."

As a result, there are no loopholes a person could exploit by switching from one form of search to another. Each must follow the same pathway, passing through RecordService's filter.

[Learn more about what Cloudera is doing to advance Hadoop. See Cloudera Sees Spark Emerging As Hadoop Engine.]

Hadoop customers want to store and analyze data on one platform, and use one architecture instead of different architectures on different servers. Completing that singular platform is the challenge. "The pieces are there," Brandwein said. It is more a question of Hadoop reaching maturity, where those pieces are all in their proper place, working together.

"Hadoop is rapidly completing. I don't think we are there yet," he said. "The vision is not Hadoop being another database. We are reinventing how analytics are done."

Hadoop began life as a way to store and process big data.

Its most common use is ETL (extract, transform, and load), according to a recent study done by AtScale. Now the goal is to provide "an end-to-end analysis chain," collecting data in one place and working with it in multiple ways, Brandwein said.

William Terdoslavich is an experienced writer with a working understanding of business, information technology, airlines, politics, government, and history, having worked at Mobile Computing & Communications, Computer Reseller News, Tour and Travel News, and Computer Systems ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Commentary
Study Proposes 5 Primary Traits of Innovation Leaders
Joao-Pierre S. Ruth, Senior Writer,  11/8/2019
Slideshows
Top-Paying U.S. Cities for Data Scientists and Data Analysts
Cynthia Harvey, Freelance Journalist, InformationWeek,  11/5/2019
Slideshows
10 Strategic Technology Trends for 2020
Jessica Davis, Senior Editor, Enterprise Apps,  11/1/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Getting Started With Emerging Technologies
Looking to help your enterprise IT team ease the stress of putting new/emerging technologies such as AI, machine learning and IoT to work for their organizations? There are a few ways to get off on the right foot. In this report we share some expert advice on how to approach some of these seemingly daunting tech challenges.
Slideshows
Flash Poll