MongoDB Upgrade Fills NoSQL Analytics Void - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Software // Information Management
News
8/29/2012
08:36 AM
Connect Directly
LinkedIn
Twitter
RSS
E-Mail
50%
50%

MongoDB Upgrade Fills NoSQL Analytics Void

Latest release of 10Gen's database sidesteps complicated MapReduce processing with a new data-aggregation framework. That distances MongoDB from NoSQL rivals including Cassandra, HBase, and Riak.

Big Data Talent War: 10 Analytics Job Trends
Big Data Talent War: 10 Analytics Job Trends
(click image for larger view and for slideshow)
10Gen, the company behind the fast-growing MongoDB database, on Wednesday announced the general availability of a highly anticipated upgrade that promises easier analytic querying of a NoSQL database best known for speedy transactional performance.

The new release, MongoDB 2.2, is the production-ready result of a 2.1 developers' preview that has been beta tested by the MongoDB community since January. Key upgrades include a new real-time aggregation framework, new sharding and replication features for multi-data-center deployments, and improved performance and database concurrency for high-scale deployments.

The biggest news in the upgrade is clearly the new real-time data aggregation framework, which lets users directly query data within MongoDB without resorting to writing and running complicated, batch-oriented MapReduce jobs within the database.

"MapReduce works well when it's a complex analysis that you need to handle with batch processing, but if you're trying to do something simple like compute the average of a list of numbers, it's overkill," explained Jared Rosoff, director of product marketing at 10gen in an interview with InformationWeek.

What was missing before 2.2, and indeed in most NoSQL databases, according to Rosoff, is routine query functionality that lets you handle the kind of data-filtering and data-analysis tasks you would otherwise handle with SQL--that is if you were using a relational database. That's exactly what the data aggregation framework provides: a collection of data operators that can handle 80% of the tasks that MongoDB developers used to handle with MapReduce, according to 10gen.

[ Want more on MongoDB? Read MongoDB NoSQL Database Poised For Takeoff. ]

The MongoDB query language is not SQL, but 10gen describes it as a simple, expressive language with a straightforward syntax for efficient querying. Examples of simple query statements include "sum," "min," "max," and "average." These sorts of operators would be familiar to any database veteran or analyst, and they're applied in a real-time data-processing pipeline that delivers sub-second performance, according to 10gen.

Other available query statements include "project," which is used to select desired attributes and ignore everything else. "Group" lets you combine results with desired attributes. "Match" is a filter than can be used to eliminate documents from a query. "Limit," "skip" and "sort," are statements used in much the same way they're used in SQL: to limit a query to a desired number of results, to skip over a given number of results, and to sort results alphabetically, numerically or by some other value.

SQL veterans might ask, "why not just use a relational database?" Rosoff says MongoDB is displacing products like Oracle Database and Microsoft SQL Server because of its scalability and flexibility. MongoDB runs on low-cost, highly distributed nodes of commodity hardware much like Hadoop, but unlike that data-processing platform, it's a database that can run applications.

Like other NoSQL databases, MongoDB gives users the flexibility to store and recall any type of data without the rigid constraints of a fixed data model--something that relational databases demand. New data types including complex data and loosely structured textual information can be added without first conforming the data to a predefined schema.

"Customers frequently tell us they've spent as long as a year trying to model complicated schemas in relational databases but they just couldn't make it work or perform," Rosoff said. "People are adopting Mongo because every document stored in the database can have slightly different fields, and documents can have more structure than rows in a relational database."

A good use case for NoSQL is modeling a product catalog for an e-commerce site. If that site sells books, shoes, furniture, and MP3s, the catalog will require many different fields to cover diverse product attributes, but at the same time, all of those products have product IDs, prices, and descriptions. That's hard to structure in a relational database, but "you can model that type of data much more simply in Mongo," Rosoff said.

The new aggregation framework promises to fill the need for fast, simple querying in MongoDB, but more complex analyses can still be handled with MapReduce processing within the database. And for really complex data processing and analyses, there's a MongoDB-Hadoop connector that lets users handle those tasks on separate Hadoop clusters.

New multi-data-center support features included in the 2.2. release give administrators tighter control over data location to meet compliance demands. For example, certain privacy regulations in Europe demand that customer data is stored within the country or continent. Tag-aware database sharding and replication features in 2.2 support location-based storage and retention. In addition, different types of data can be assigned to content-appropriate hardware, as in fast storage for frequently accessed data and low-cost options for archival information.

MongoDB 2.2 performance and concurrency is said to be improved with a new locking architecture that 10gen says handles frequent database reads and writes. Locking ensures data integrity by ensuring that one transaction is completed before another can update the same information. By using a more fine-grained locking approach and detecting when data is on disk rather than in RAM, 10gen says Mongo 2.2 handles more disk input and output demands under load without degrading database performance.

The performance gains and multi-data-center support features are table stakes for big data deployments that 10gen had to deliver. The data aggregation framework distances MongoDB from NoSQL competitors including Cassandra, HBase, and Riak, according to Rosoff. Gartner analyst Merv Adrian told InformationWeek he's cautiously optimistic that 10gen will deliver what's promised.

"Time will tell if 10gen's '80% of the use cases' assertion proves out, but there is no doubt that grouping and aggregation functions do make up a lot of the intended [analytic] work in their customer and prospect base," Adrian said.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Slideshows
10 Ways to Transition Traditional IT Talent to Cloud Talent
Lisa Morgan, Freelance Writer,  11/23/2020
News
Top 10 Data and Analytics Trends for 2021
Jessica Davis, Senior Editor, Enterprise Apps,  11/13/2020
Commentary
Can Low Code Measure Up to Tomorrow's Programming Demands?
Joao-Pierre S. Ruth, Senior Writer,  11/16/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Why Chatbots Are So Popular Right Now
In this IT Trend Report, you will learn more about why chatbots are gaining traction within businesses, particularly while a pandemic is impacting the world.
Slideshows
Flash Poll