12:55 PM
Doug Henschen
Doug Henschen
Connect Directly

Oracle: SQL Best For Big Data Analysis

Oracle admits there's a place for Hadoop and NoSQL, but it's sticking with its relational-database-centric view of big data opportunity.

10 Big Data Online Courses
10 Big Data Online Courses
(Click image for larger view and slideshow.)

The relational database is maligned and misrepresented by big-data zealots. That's the perspective Oracle EVP and Database Group leader Andy Mendelsohn shared at this week's Oracle OpenWorld event.

"A lot of people out there say, 'Relational databases are old, legacy products from 40 years ago,' and now you want something new, like NoSQL or NewSQL," Mendelsohn began in a broad-ranging, hour-long keynote. "You can rest assured that relational databases will keep evolving as needs and technologies evolve."

As the market-share leader in relational databases, Oracle is invariably cited by big-data vendors of every description as the incumbent competition. Indeed, many of those upstarts are thriving on IT budget dollars that might have otherwise gone to Oracle. Mendelsohn's message was that investments in Oracle will advance big data (and cloud) aspirations.

[Want more on Oracle Big Data Discovery? Read Oracle Unveils Hadoop Data Exploration Tool.]

"One of the big myths out there is you need NoSQL databases or Hadoop because relational databases can't deal with unstructured data," he said. "As you all know as Oracle customers, we added support for unstructured data 20 years ago. In fact, we've extended SQL so you can do smart things with unstructured data."

If you have geospatial coordinates, for example, you can ask the database to show you all the customers within a 10-mile radius of your store. "You can't do that with NoSQL databases," Mendelsohn said, allowing, "Maybe you can do that with a lot of work in some of the other system."

Oracle EVP Mendelsohn defends relational databases at Oracle OpenWorld.
Oracle EVP Mendelsohn defends relational databases at Oracle OpenWorld.

Here's where NoSQL vendors Basho and DataStax would point out that the open-source databases that they support (Riak and Cassandra, respectively) are integrated with the Solr search engine, which supports geospatial searches. A separate query approach, yes, but it doesn't sound like a lot of work.

Mendelsohn also noted that Oracle has extended its flavor of SQL to handle JSON (Java Script Object Notation), a fast-growing data type often used by web and mobile applications. And with JSON support recently added to Oracle Database 12c in a July update, you can now add JSON columns to the relational database and use SQL extensions to pluck out attributes or fields from JSON documents.

Here's where a NoSQL vendor like MongoDB might point out that the JSON support that Oracle, IBM (with DB2), and other relational database vendors have introduced doesn't perform in quite the same way as a NoSQL database.

"You end up with compromises with most of these products, like you can't do joins or you can't use all of the indexes that they support, or you can't access

Next Page

and update individual fields in a record," said Kelly Stirman, MongoDB's director of products. "Even if they get better, you still can't scale these systems."

The model for scaling relational is "almost always larger hardware," according to Stirman, and even when there's a distributed option, like Oracle RAC, "it still requires shared storage, and it's not designed to be deployed across data centers."

Oracle does have its own NoSQL product, the Oracle NoSQL Database, and it was updated in April to a 3.0 release that Mendelsohn said "can go head-to-head with any NoSQL product." But he touted schema flexibility, not scale, as its calling. Oracle also has a Hadoop distribution (based on Cloudera) that runs on the Oracle Big Data Appliance. Sheer scalability is Hadoop's calling in Mendelsohn's book. But when it comes to accessing data, Mendelsohn said NoSQL and Hadoop fans are "creating problems for themselves" because they now have data fragmented across multiple platforms with no common language.

[Want more on Oracle Big Data Discovery? Read Oracle Unveils Hadoop Data Exploration Tool.]

NoSQL products don't use SQL, so they offer "primitive, low-value APIs and simple filtering," Mendelsohn said. And Hadoop vendors started out by promoting MapReduce, "but it turned out to be too complicated for most people, and it's a slow, batch-processing environment."

Mendelsohn observed dryly that NoSQL and Hadoop vendors are "figuring out that SQL is not such a bad idea." NoSQL vendors are "inching toward table abstraction" while the Hadoop vendors have multiple "Little SQL" SQL-on-Hadoop projects.

"When you look at their SQL implementations and their maturity compared to SQL, there's a big difference in the power of the language, the performance, the query optimization, and so on."

Mendelsohn explains Oracle Big Data SQL, which queries across NoSQL, Hadoop, and Oracle Database.
Mendelsohn explains Oracle Big Data SQL, which queries across NoSQL, Hadoop, and Oracle Database.

There's a lot of truth in these statements, but Oracle has its own answer for these gaps with the Oracle Big Data SQL query tool, which is designed to run SQL queries across Hadoop, NoSQL databases (just Oracle's, currently), and Oracle Database. You don't have to move high-scale data from those other platforms to Oracle Database. You just query it in place.

"All your developers know how to program against it, all your standard BI tools and third-party tools just work, and it's how we've solved the problem of big-data analytics," Mendelsohn said. 

InformationWeek took a deep dive on Oracle Big Data SQL in July, and we came away impressed. Broad SQL access is a very good thing, and it's something other data-management vendors, namely Teradata with Query Grid and Microsoft with Polybase, are also working on. At Oracle OpenWorld the company also introduced a data-exploration and visualization tool for Hadoop called Oracle Big Data Discovery.

The good news for Oracle customers is that the company is acknowledging that there are other platforms in the world. You won't catch Mendelsohn or chairman and CTO Larry Ellison admitting to the cost advantages of NoSQL databases or Hadoop. But with Big Data SQL and Oracle Big Data Discovery, the company is providing tools that will help customers tap these platforms.

So there's progress, but the suggestion that SQL "solves big data analytics" doesn't do justice to all the data science, use of algorithms and machine learning, and other techniques unleashing big-data insight. Mendelsohn is a database champion, so it's no surprise to hear him touting SQL. Just keep in mind that SQL is important, but it's not the only important form of big-data analysis.

Avoiding audits and vendor fines isn't enough. Take control of licensing to exact deeper software discounts and match purchasing to actual employee needs. Get the Software Licensing issue of InformationWeek today.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Email This  | 
Print  | 
More Insights
Copyright © 2021 UBM Electronics, A UBM company, All rights reserved. Privacy Policy | Terms of Service