Document-oriented database can scale-out as demand increases by adding nodes on a server cluster.

Charles Babcock, Editor at Large, Cloud

March 31, 2010

3 Min Read

Startup 10gen, sponsor of the MongoDB open source project, this week launched commercial support for MongoDB, a NoSQL-style data system for use with large Web applications or very large data sets.

MongoDB derives its name from "humongous" and is a document-oriented database that can scale-out as demand increases by adding nodes on a server cluster. In addition to rapid scale out, the project includes traditional database functions, such as dynamic queries and indexes but does not rely on relational database type schemas.

10gen aims to boost enterprise adoption of MongoDB through professional services and technical support offerings, said Dwight Merriman, co-founder and CEO of 10gen.

Merriman is former co-founder and CTO of Doubleclick, where he architected the online advertising management system sold to Google in 2007 for $3.1 billion.

MongoDB joins the emerging class of non-relational data sorters and managers, sometimes referred to as NoSQL systems because of their departure from the strict rules of relational database.

NoSQL systems seek to separate large scale online data handling from relational database systems. Other efforts include the Apache Software Foundation's open source Cassandra project, initially developed at Facebook, Apache's CouchDB open source document data system, Google's Big Table, and Amazon.com's internal Dynamo system.

Several social networking sites have recently announced their movement away from MySQL to rely more heavily on Cassandra, including Twitter, Facebook and Digg. These systems trade off the guarantee of proper transaction handling for speed of updates and ability to produce fast, large scale reads that keep up with hits on a Web site. It's possible to get two different answers to the same query with such systems, critics say, because their data lacks referential integrity.

But advocates say they don't use their NoSQL systems for transactions and their asynchronous updates yield many performance and scaling advantages. The NoSQL systems avoid joins, one of the precision but performance-inhibiting features of relational systems, which combine related data from different database tables.

"We don't think this is going to be a small niche. It will creep out into the enterprise," Merriman said in an interview as 10gen launched March 29. When his company first started making MongoDB available for free downloads last year, they numbered a few hundred a month. But traffic has rapidly built up to a level of 30,000 downloads a month, he said. The activity reflects an interest in systems "where you want high performance, moving binary data (such as unstructured documents or video) is a really good fit," Merriman said. MongoDB won't be used in futures trading systems since it does not focus on completing a long running transaction, he added.

The NoSQL systems are easier to develop for than relational databases and give their adopters increased agility in dealing with masses of data gleaned from an active Web site, Merriman said. For example, MongoDB stores documents as JSON objects or JavaScript Object Notation, a lighter weight and easier to read method of document storing than XML.

"It's clean and elegant and developers like using it," Merriman said. It's in production use at SourceForge, Electronic Arts, the New York Times, and Boxed Ice.

10gen will charge $5,000 per server for a year of "gold-level" around the clock MongoDB support. Basic and silver support are also available.

IDC analyst Carl Olofson thinks the architecture's got potential.

"This technology will be used for such activities as data warehouse definition and preparation, master data management initiatives and large scale enterprise data indexing and cataloging projects," Olofson wrote in a research note. InformationWeek has published an in-depth report on the expanding profile of data deduplication. Download the report here (registration required).

About the Author(s)

Charles Babcock

Editor at Large, Cloud

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive Week. He is a graduate of Syracuse University where he obtained a bachelor's degree in journalism. He joined the publication in 2003.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights