Five Factors Shaping Data Science - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management
Commentary
8/12/2019
07:00 AM
Ryohei Fujimaki, CEO and founder of dotData
Ryohei Fujimaki, CEO and founder of dotData
Commentary
50%
50%

Five Factors Shaping Data Science

As data science evolves, key challenges are driving organizations to seek innovative solutions to compete in the new AI-driven economy.

Image: Dmitry - stock.adobe.com
Image: Dmitry - stock.adobe.com

In late 2018, a survey by Univa found that 96% of respondents expected an "explosion in machine learning projects" in production by 2020.  International Data Corp. forecasts that spending on artificial intelligence and machine learning will grow to $57.6B by 2020. Fraud detection, customer analysis, churn prediction, and numerous other applications are driving the rapid growth of AI and ML. The world of AI, however, has a problem. A 2019 survey by Dimensional Research found that 80% of companies reported stalled AI and ML projects.

The following five factors are causing these slowdowns, each with its own set of challenges and opportunities:

1. Making data actionable for data science

The same Dimensional Research article found that 96% of respondents cited data quality and data labeling as crucial problems slowing their AI and ML adoption. Data silos are especially troublesome for data science. 

Businesses store vast amounts of data, but often in different lines of business, across disparate systems and with varying levels of leadership and governance. Whether through manual processes or by leveraging automation, the first struggle for data science teams is to access and collect relevant data from different sources. Chief information officers and chief data officers must lead the charge to make data actionable for data science. Mitigating challenges related to data integration, ETL, security, and data privacy will drive faster turnaround of data science projects and making data science quicker and more efficient.

2. Shortage of data science talent

A 2018 LinkedIn survey found a lack of over 150,000 people in the U.S. with data science skills. The rapid adoption of ML and AI and the shortage of labor are likely to exacerbate the talent problem. To produce meaningful results, organizations must leverage statistical knowledge, data management, engineering, and subject matter expertise to tackle data quality, architecture design, and model production. Finding this multi-talented unicorn is impossible. Given the complexity of data science, it's no wonder that 88% of data science graduates have a master's degree and 46% a Ph.D. Addressing this problem requires expanded education as well as continued investments in growing the talent pool at a corporate as well as governmental level. New technologies to automate and accelerate the data science process also promise to reduce talent constraints.

3. Time-to-value must accelerate

The plodding pace of development also slows data science. Data science projects are iterative in nature due to the uncertainty of data and require a deep understanding of underlying business problems. Data scientists create a series of hypotheses to be tested and validated with actual business data by wrangling, cleansing, joining, combining, and aggregating data to identify data relationships and extract relevant patterns to build ML models. This process requires a rigorous trial and error approach to find the right answers, often involving multiple exchanges between business and data science teams prolonging projects. Accelerating the time-to-value of data science is critical to fulfilling the promise of AI and ML.

4. Business users need transparency

While the benefits of AI and ML can be high, one of the challenges of data science is the frequent disconnect between ML models and the expectations of business users. The difficulty in explaining how ML and AI models work, and how they generate results leads to a lack of trust by line of business users who don't have enough transparency to trust the process.  Providing greater clarity and transparency for users will be a critical aspect of bridging the gap between the "black box" of data science and user needs. While systems that better "verbalize" AI models will help, a closer relationship between LOB users and data science teams will be critical in bridging the gap.

5. Improving the operationalization process

Lastly, the migration of data science models to production environments is fraught with impediments and challenges. Models that worked well in development don't scale and often don't work in production systems. The result is the slow and tedious rework and "fine-tuning" of models. When models work in production environments, they degrade as data changes, leading to model maintenance and rework. The integration and acceleration of AI and ML models into production environments will require a shift in thinking to be able to accelerate rework and optimize production use.

The world of data science is undergoing some radical changes. Increasing requirements for transparency, an ever-increasing workload from business users and a continuing shortage of qualified data science experts are all putting more significant pressure on data science teams to accelerate processes, automate as much of their work as possible and provide broader levels of adoption of the data science process by non-data scientists. Organizations relying on data science will have to put critical changes in place to effectively address each of these challenges and compete in the new AI-driven economy.

Ryohei Fujimaki is the Founder & CEO of dotData, a spin-off of NEC Corp., and the first company focused on delivering full-cycle data science automation for the enterprise. Fujimaki is a world-renowned data scientist and was the youngest research fellow appointed in the 119-year history of NEC.

 

 

The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Slideshows
What Digital Transformation Is (And Isn't)
Cynthia Harvey, Freelance Journalist, InformationWeek,  12/4/2019
Commentary
Watch Out for New Barriers to Faster Software Development
Lisa Morgan, Freelance Writer,  12/3/2019
Commentary
If DevOps Is So Awesome, Why Is Your Initiative Failing?
Guest Commentary, Guest Commentary,  12/2/2019
White Papers
Register for InformationWeek Newsletters
State of the Cloud
State of the Cloud
Cloud has drastically changed how IT organizations consume and deploy services in the digital age. This research report will delve into public, private and hybrid cloud adoption trends, with a special focus on infrastructure as a service and its role in the enterprise. Find out the challenges organizations are experiencing, and the technologies and strategies they are using to manage and mitigate those challenges today.
Video
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
Slideshows
Flash Poll