Non-RDMBS Emerges as an Analytics Data Choice - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management
08:45 AM
Connect Directly

Non-RDMBS Emerges as an Analytics Data Choice

A study finds that 70% of all analytics data now comes from non-relational sources. What does that mean for you?

A recent study released by big data visual tools developer Zoomdata shows that data analytics has crossed the digital Rubicon, with non-relational database management systems (RDBMS) now comprising 70% of analytics data sources.

According to the study, approximately 40% of data sources are now composed of modern non-RDBMS sources like Hadoop, NoSQL, in-memory, and search databases. Another 20% are columnar/MPP analytic databases, and 10% are cloud native data stores, such as Amazon Redshift and Google BigQuery. Only 30% of data analytics is still performed against traditional relational database management systems, the study notes. The research was conducted for Zoomdata by O'Reilly Media, with 875 respondents participating in the survey.

Image: Shutterstock
Image: Shutterstock

The dramatic shift toward non-RDBMS and away from traditional relational databases such as MySQL, PostgreSQL and SQLite3 for analytics applications, isn't likely to reverse or even slow down given the way organizations now collect information. "Data is no longer homogenous, and the volume of data collected has grown exponentially over the last few decades," says Jamie Griffiths Craighead, a business and computer systems instructor at Beacon College in Leesburg, Fla. "Today, we have interconnected systems streaming non-normalized, non-homogenous data from many sources at a rate where there may be tens of millions of new records per day."

Exploding data

A prime force driving organizations away from RDBMS technology is the sheer scope and scale of the current data explosion. "We have more systems and devices generating more data than ever," says Tim Platt, vice president of IT business services at Virtual Operations, an IT support and management services provider based in Winter Park, Fla. "There’s a lot of (data); it takes diverse forms and it comes in quickly in many cases."

Traditional RDBMS systems have long struggled to accommodate scalability and scope issues, and are also dogged by numerous other disadvantages.  "They are often licensed, proprietary software, with huge licensing fees tied to CPUs or cores and, even then, they struggle to scale horizontally," Platt explains. "Many times, the only option is to buy a bigger server with more CPU, more RAM, and more storage". 

Worse yet, RDBMS's greatest strength -- data integrity -- has now become its biggest weakness. "To ensure consistent entry of data, (RDBMS) requires a strict data model enforced by tons of referential data relationship constraints," notes Gavin Woods, director of consulting at PITSS, a Troy, Mich.-based Oracle systems data conversion and modernization firm. Although still preferred in many use-cases, RDBMS's data model burden emerges as a serious limitation in cases where an organization requires flexibility and databases that can deployed over multiple instances nationwide or worldwide. "RDBMS does not fit this bill; enter the non-relational database," Woods says.

Multiple benefits

Non-RDBMS databases, such as NoSQL, offer a key benefit to application developers: ease of access. "Relational databases have a fraught relationship with applications written in object-oriented programming languages like Java, PHP and Python," observes Milind Shah, field CTO at cloud consulting services provider Stratiform of El Segundo, Calif. "NoSQL databases are often able to sidestep this problem through APIs, which allow developers to execute queries without having to learn SQL or understand the underlying architecture of their database system," he explains.

Instead of relying on tables, non-RDBMS databases are document-oriented. "This way, non-structured data -- such as articles, photos, social media data, videos, or content within a blog post -- can be stored in a single document that can be easily found, but isn’t necessarily categorized into fields like a relational database does," Shah says. Such as approach is highly intuitive, yet storing vast amounts of data in bulk requires extra processing effort and more storage than highly organized data. "That’s why Hadoop, an open-source computing and data analysis platform capable of processing huge amounts of data in the cloud, is so popular in conjunction with NoSQL database stacks," Shah says.

Another key benefit is that many non-traditional RDBMSes can be made to scale horizontally instead of vertically, allowing relatively low-cost servers to be combined into a single, powerful cluster. "It’s generally more cost effective to stand up four eight-core servers than to stand up a single 32 core server," Platt says. "Therefore, it’s more cost effective to scale, but the other benefit is that the data -- and processing power -- can be partitioned in ways that it can be processed in parallel, which means incoming data can be processed quicker, or analysis queries can run quicker."

By working directly and natively with non-RDMBS data stores, data analysts can expand their skillsets and value. "For example, analytics users that understand how to leverage graph queries can derive deep network structure insight and wide relationship analysis over graphed data that simply can’t be computed on relational schema structured data," says Mike Matchett, a market analyst at research firm Small World Big Data, based in Hopkinton, Mass. "Non-RDBMS solutions can solve great performance challenges, tackle huge scales of data, help mine value from a wider variety of data types and are essential for web-scale, real-time, graph structured applications," he adds. Additionally, the overwhelming majority of non-RDBMS solutions are open source, allowing users to tackle vast and varied amounts of data not only directly, but also more cost-effectively.

RDBMS: Not dead yet

Although fading, RDBMS isn't likely to vanish anytime soon. Transactional consistency, in particular, remains a traditional RDBMS stronghold. "If your data is structured in a consistent fashion and you don’t have scalability issues, a traditional RDBMS might be the best solution," Platt says. He notes that it's also easier for organizations to find experienced traditional RDBMS database administrators, data modelers and developers. "The tool sets and features of these platforms are very mature," he notes.

RDBMS is also still king -- at least for the time being -- for organizations' core systems of record, which demand the exactness and certitude that RDBMS continues to offer. "The evolution here, though, is that an RDBMS doesn’t handle all data well, and data consumers will always want to work with as much data as they can," Matchett says.

Still, as time goes on, RDBMS's hold is rapidly weakening. "If your data requirements aren’t clear at the outset, or if you’re dealing with massive amounts of unstructured data, you may not have the luxury of developing a relational database with clearly defined schema," Shah says. Think of non-relational databases more like file folders, assembling related information of all types. "If a WordPress blog used a NoSQL database, each file could store data for a blog post: social likes, photos, text, metrics, links, and more," Shah says.

To keep legacy RDBMS deployments alive, some tool providers have begun teaching new tricks to their old offerings. "Oracle’s MySQL has added some non-RDBMS-like capabilities in table fields that can be configured to store searchable JSON documents," Craighead says. Similarly, MongoDB, one of the most popular non-RDBMS offerings, can now store data in groups of JSON documents. "Non-relational data will continue to grow and we may see more hybrid database systems as traditional RDBMS add non-relational capabilities and non-relational systems add some features from traditional RDBMSes," Craighead observes.

Erik Gfesser, principal architect at Chicago-based IT consulting firm SPR Consulting, also sees a growing trend toward further hybridization. "Different types of processing, spanning the transactional to analytical spectrum, can be performed efficiently enough so that the need to use separate database products is lessened," he says.

Craighead notes that the trend toward non-RDBMS tools shouldn’t have any negative impact on data analytics users, since many analytics products now include support for non-relational data stores. "The positive impact for analytics users is the additional data that can be made available for analysis and increased query speed," Craighead says. "Non-RDBMS allows data to be stored in such a way that the need to perform join operations across tables or databases is reduced, leading to significant speed improvements.

Some like it hot

Choosing between RDBMS and non-RDBMS requires carefully examining the analytics task at hand, as well as future analytical needs. "It’s common on development projects for someone to want to implement a NoSQL database for the sole reason that it’s a hot new technology," Platt says. Yet that’s never the right way to make a decision. Sometimes, the best decision is to choose both technologies. "We see projects that combine both relational DBs, where it makes sense, and NoSQL, where it makes sense," Platt says. "You don’t need a 'one or the other' approach."

The database product selection process should always take into account how the product is going to be used in the real world, as well as who will be expected to provide long-term maintenance. "Enterprises should be careful not to adopt technologies simply because they view them as being commonplace, or because a handful of individuals advocate usage," Gfesser says.

Performing due diligence before product selection will likely pay big dividends down the road. "As a consultant, I've seen many instances in which clients joined the bandwagon rather than first performing due diligence, and this typically doesn't end very well," Gfesser says. "As someone who periodically attends technology focused meetups, I'm reminded of a Hadoop consultant who last year commented to the audience that 'most Hadoop clusters out there are a mess; people do not know what they are doing'."


John Edwards is a veteran business technology journalist. His work has appeared in The New York Times, The Washington Post, and numerous business and technology publications, including Computerworld, CFO Magazine, IBM Data Management Magazine, RFID Journal, and Electronic ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
11/6/2017 | 6:30:29 AM
Pending Review
This comment is waiting for review by our moderators.
Data Science: How the Pandemic Has Affected 10 Popular Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  9/9/2020
The Growing Security Priority for DevOps and Cloud Migration
Joao-Pierre S. Ruth, Senior Writer,  9/3/2020
Dark Side of AI: How to Make Artificial Intelligence Trustworthy
Guest Commentary, Guest Commentary,  9/15/2020
White Papers
Register for InformationWeek Newsletters
2020 State of DevOps Report
2020 State of DevOps Report
Download this report today to learn more about the key tools and technologies being utilized, and how organizations deal with the cultural and process changes that DevOps brings. The report also examines the barriers organizations face, as well as the rewards from DevOps including faster application delivery, higher quality products, and quicker recovery from errors in production.
Current Issue
IT Automation Transforms Network Management
In this special report we will examine the layers of automation and orchestration in IT operations, and how they can provide high availability and greater scale for modern applications and business demands.
Flash Poll