IBM Datapalooza Takes Aim At Data Scientist Shortage - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
10:05 AM
Connect Directly

IBM Datapalooza Takes Aim At Data Scientist Shortage

IBM's quest to create 1 million new data scientists goes forward with a three-day training event and a test bed for taking its show on the road in 2016.

Gartner: 10 Radical Changes Coming To IT
Gartner: 10 Radical Changes Coming To IT
(Click image for larger view and slideshow.)

IBM announced in June that it has embarked on a quest to create a million new data scientists. It will be adding about 230 of them during its Datapalooza educational event this week in San Francisco, where prospective data scientists are building their first analytics apps.

Next year, it will take its show on the road to a dozen cities around the world, including Berlin, Prague, and Tokyo.

The prospects who signed up for the three-day Datapalooza convened Nov. 11 at Galvanize, the high-tech collaboration space in the South of Market neighborhood, to attend instructional sessions, listen to data startup entrepreneurs, and use workspaces with access to IBM's newly launched Data Science Workbench and Bluemix cloud services. Bluemix gives them access to Spark, Hadoop, IBM Analytics, and IBM Streams.

[Want to know where Spark is heading? See Cloudera Sees Spark Emerging as Hadoop Engine.]

Rob Thomas, vice president of product development, IBM Analytics, said the San Francisco event is a test drive for IBM's 2016 Datapalooza events. "We're trying to see what works and what doesn't before going out on the road."

Thomas said Datapalooza attendees were building out DNA analysis systems, public sentiment analysis systems, and other big data apps.

Apache Spark sits at the center of IBM's education for future data scientists.

(Image: matdesign24/iStockphoto)

(Image: matdesign24/iStockphoto)

In June, IBM contributed its SystemML machine-learning engine to the Spark platform so that Spark can be used to analyze incoming streams of machine-generated data. Spark can serve both as a platform for capturing and analyzing the data and a launch pad for retrieving it from other types of data repositories for analysis.

Unlike Hadoop, which relies on data being stored to disk before retrieval for analysis, Spark can work with data placed in random access memory, speeding the pace at which it can be retrieved and used. IBM spokesmen describe Spark as 100X as fast as Hadoop when working with data in server memory.

Thomas explained that most machine-learning systems are built on a data system that uses one set of algorithms and one data model, and when data from a different machine or type of machine event is collected, it requires a different model. Spark with SystemML is much more flexible than other data platforms. With it, an existing system can be adjusted to analyze an altered data flow without requiring a whole new system, Thomas said.

Spark is so much at the heart of the way IBM sees the future of data management that the company is converting many of its internal systems to work on Spark. At this point it has also converted 15 products to being Spark-based, including its IBM SPSS statistical analysis, DataWorks data preparation and refinement, and IBM's product pricing software module that helps companies dynamically address complex pricing issues, he said.

"We reduced the number of lines of code needed for DataWorks from 40 million to 5 million" by making use of the distributed data processing available in Spark, Thomas said. Spark also simplifies what prospective data analysts need to know to get started.

"We had some people who stayed here until 2 a.m. last night; they were that engaged," said Thomas.

But will three days of attending classes and programming with other budding analysts really amount to turning out a data scientist? Thomas laughs at the question. "Most people who would like to do analytics have never built an application. Here they'll get the experience of building one," and be ready to go on to their next project. That puts them on a more direct path to becoming a data scientist than many other possibilities, he said.

In addition to the Datapalooza, IBM now operates a Spark Technology Center in San Francisco.

**New deadline of Dec. 18, 2015** Be a part of the prestigious InformationWeek Elite 100! Time is running out to submit your company's application by Dec. 18, 2015. Go to our 2016 registration page: InformationWeek's Elite 100 list for 2016.

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Ninja
11/14/2015 | 6:32:07 PM
Re: Spark sought a better way to run MapReduce-type jobs
SPSS is a tool that has helped to revolutionize many of the social science subject areas. If more individuals could deploy compute resources for solving current issues, inefficiencies and problems by building apps then, it would also result in a revolution.
Charlie Babcock
Charlie Babcock,
User Rank: Author
11/12/2015 | 7:47:56 PM
Spark sought a better way to run MapReduce-type jobs
Spark began as a project to speed up big data handling at UC Berkeley, let's say compared to MapRecduce. Itmoved into the Apache Software Foundation incubator in 2013 and became a full-fledged project in 2014. IBM endorsed Spark in a big way last June and founded the Spark Technology Center in San Francisco in its existing Market Street offices. The Datapalooza could have taken place there but instead was closer to startup territory about four blocks away at 44 Tehama.
10 Ways to Transition Traditional IT Talent to Cloud Talent
Lisa Morgan, Freelance Writer,  11/23/2020
What Comes Next for the COVID-19 Computing Consortium
Joao-Pierre S. Ruth, Senior Writer,  11/24/2020
Top 10 Data and Analytics Trends for 2021
Jessica Davis, Senior Editor, Enterprise Apps,  11/13/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
Why Chatbots Are So Popular Right Now
In this IT Trend Report, you will learn more about why chatbots are gaining traction within businesses, particularly while a pandemic is impacting the world.
Flash Poll