IBM Bets On Apache Spark As 'The Future Of Enterprise Data' - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management

IBM Bets On Apache Spark As 'The Future Of Enterprise Data'

The key problem Spark resolves is access to data across the enterprise. IBM initiatives include providing courses to train 1 million data scientists and engineers to use it.

7 Data Center Disasters You'll Never See Coming
7 Data Center Disasters You'll Never See Coming
(Click image for larger view and slideshow.)

IBM is making a major commitment to the future of Apache Spark, with a series of initiatives announced today. IBM will offer Apache Spark as a service on Bluemix; commit 3,500 researchers to work on Spark-related projects; donate IBM SystemML to the Spark ecosystem; and offer courses to train 1 million data scientists and engineers to use Spark.

The commitment to Spark is "right in the heart of what [IBM] has been doing," said Rob Thomas, VP for product development for IBM Analytics, in an interview. That database heritage hearkens back to earlier commitments to Linux, and even further back to IBM's DB2 database product, he said. But it is rare for IBM to make a technological bet such as Spark, he added.

"This is the future of enterprise data." Thomas continued. "Anyone using data will have to leverage Spark."

(Image: Geralt via Pixabay)

(Image: Geralt via Pixabay)

The key problem Spark resolves is access to data across the enterprise. A typical large corporation will have hundreds, if not thousands of data sets residing in different databases across its IT system.

A data scientist can certainly craft an algorithm to plumb the depths of any database. But "it takes a data scientist 90 days of work" to craft that algorithm, Thomas said. "Today, if you port it to another system, you are talking about another 90 days of work" to re-craft and adjust that algorithm in order to get it to work. Spark "eliminates that second 90 days." he said. A Spark-based system can seamlessly and transparently access and analyze any database, without additional development and delay.

[ What's in store for Hadoop? Read Will 2015 Be The 'Year Of Hadoop'?. ]

Another virtue Spark possesses is ease of use. Developers can concentrate on building the solution, instead of building an engine from scratch.

IBM sponsored a hackathon recently during which more than 100 teams crafted new Spark-based apps in about 10 days. One team made a genomic cloud system to analyze DNA samples, another created a search engine to gauge public opinion based on sentiments perceived in text. Thomas pointed to these projects as "proof of concept" to show how quickly a competent team of two or three people complete a project using Spark.

"The weakest part of Spark is the machine learning piece," Thomas noted. To that end, IBM will make available its SystemML machine learning technology to add learning capability to Spark apps, working with partner Databricks. This is not an algorithm library, but an engine that understands algorithms, Thomas said of SystemML.

While Spark looks promising, nothing will come of it without sufficient numbers of data scientists who actually use it. And data scientists don't grow on trees. IBM wants to educate about 1 million new users through a series of partnerships with AMPLab, DataCamp, MetiStream, Galvanize, and the Big Data University MOOC. The goal here is to make available a "data scientist's work bench" where users who know the R programming language can pick up Spark and its uses very quickly, Thomas said.

Ultimately, it falls to enterprises to make the best use of big data technology such as Spark. "Knowing the problem to solve—that will drive significant business value," Thomas said. CEOs are only beginning to understand how their data can be put to best use. Thomas offered the example of Moneyball, the 2003 book on how the Oakland Athletics sharpened their play of baseball through statistical analysis. "Data can make you think differently," Thomas said. And therein lies the quest for the advantages of insight.

William Terdoslavich is an experienced writer with a working understanding of business, information technology, airlines, politics, government, and history, having worked at Mobile Computing & Communications, Computer Reseller News, Tour and Travel News, and Computer Systems ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

10 Things Your Artificial Intelligence Initiative Needs to Succeed
Lisa Morgan, Freelance Writer,  4/20/2021
Tech Spending Climbs as Digital Business Initiatives Grow
Jessica Davis, Senior Editor, Enterprise Apps,  4/22/2021
Optimizing the CIO and CFO Relationship
Mary E. Shacklett, Technology commentator and President of Transworld Data,  4/13/2021
White Papers
Register for InformationWeek Newsletters
2021 State of ITOps and SecOps Report
2021 State of ITOps and SecOps Report
This new report from InformationWeek explores what we've learned over the past year, critical trends around ITOps and SecOps, and where leaders are focusing their time and efforts to support a growing digital economy. Download it today!
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll