Breaking Silos and Curating Data for Impactful AI - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // AI/Machine Learning
Commentary
10/7/2019
07:00 AM
Jaspreet Singh, Founder and CEO, Druva
Jaspreet Singh, Founder and CEO, Druva
Commentary
50%
50%

Breaking Silos and Curating Data for Impactful AI

AI requires both high-quality data and an infrastructure that ensures data is always available. Without that foundation, we'll never reach the future.

Image: WrightStudio - stock.adobe.com
Image: WrightStudio - stock.adobe.com

There is a symbiotic relationship between data and artificial intelligence. We use data to create the foundation of a successful implementation, and AI is then used to further interpret and refine that data. It is a constant feedback loop where one affects the effectiveness of the other. For machine learning to have an impact, data needs to be curated, high-quality and easily accessible. Successfully training such a technology without bias is a bigger challenge than one might expect.

Architecting an IT infrastructure that can break down data silos and make information available and actionable, while at the same time ensuring security and compliance, is already a major challenge for enterprises. Add in the desire to run that data through machine learning and AI functions and things become even more challenging -- especially in the age of cloud -- when data is widely dispersed. 

AI offers so much promise as an enterprise technology: taking on decision-making tasks and helping employees perform their jobs better. But to realize that future, organizations need to understand how to prepare their architecture and data for an AI-driven future.

Enterprise data challenge

Cloud has offered the enterprise near-limitless resources for compute and storage, which has made it possible to retain nearly limitless amounts of data, but this has been a blessing and a curse. While it presents tremendous opportunity to analyze data and derive insights into financial projections, customer demand and more, the sheer volume of data has made it difficult to easily manage and utilize for such functions. Factor in additional requirements from compliance to data quality control, and you have a rather complex situation.

The increasing adoption of cloud services in the enterprise and the continued use of legacy on-premises and hybrid solutions have created vast data silos that are often difficult to identify, let alone consolidate and analyze. These silos may not even be known by IT teams and others and have the potential to severely limit analytics and intelligence tools. 

Combined, this has led to a situation where enterprises are capturing enormous amounts of data, but know very little about it, including the amount of data that’s being stored or even where it lives. What many enterprises currently have is a complex web of on-premises and cloud data stores, each with its own management, storage, privacy and regulatory concerns. The reality is that as data becomes more fragmented, enterprises need to take a hard look at centralizing its management. This is the only way to wrap our hands around such an unwieldy amount of data and turn it into something that can positively impact the larger business. 

Why data quality matters

The goal of machine learning is to perform data-driven tasks with a level of skill, precision and speed that is far greater than what a human counterpart could provide. In the same way a person wouldn’t be able to learn a skill from the wrong textbook, a machine learning process trying to understand a poorly managed data set will fail to learn anything valuable. Conversely, an incomplete data set can help create a process that is narrower or skewed. There’s a balance required when building these data sets. 

AI, which is currently an incredibly exciting trend in the enterprise space, also can’t be built on incomplete, erroneous data sets. Much of what AI is meant to accomplish involves predictive decision-making, modeling and analysis, none of which are possible if data is incomplete, dirty or siloed. Algorithms trained to analyze a specific trend need access to as much good data as possible that can go into analyzing that trend, which may be held in separate data silos. It’s like a student that is writing a research paper; they likely need to reference sources from different sections of the library but having everything accessible under the same roof improves the process immensely.

Building an infrastructure to support AI innovation

Enterprises are collecting more data at a faster pace, and generating insights requires an approach to infrastructure that breaks down data silos and ensures high-quality data is readily available. IT departments need to broaden their focus beyond collection and retention, and begin to emphasize architecture, management and curation. Specifically, the creation of a data lake that allows for a single repository of data, as opposed to a siloed approach that puts critical information out of reach.

AI is one of the transformational technologies of the 21st century, and it promises to reshape modern businesses and mold the future of work. In fact, we are already seeing its impact in places like customer experience, where it helps create a customized and curated experience for each buyer. But it’s not a plug-and-play solution, and it requires both high-quality data and an infrastructure that ensures data is always available. Without that foundation, we’ll never reach the future.

Jaspreet Singh is the founder and CEO of Druva. An entrepreneur at heart, he bootstrapped the company, delivering the first and only cloud native data management offering that is disrupting the classic data protection market. Prior to starting Druva, Singh held foundational roles at Veritas and Ensim Corp. Additionally, he holds multiple patents and has a B.S. in Computer Science from the Indian Institute of Technology, Guwahati.

The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Slideshows
Top-Paying U.S. Cities for Data Scientists and Data Analysts
Cynthia Harvey, Freelance Journalist, InformationWeek,  11/5/2019
Slideshows
10 Strategic Technology Trends for 2020
Jessica Davis, Senior Editor, Enterprise Apps,  11/1/2019
Commentary
Study Proposes 5 Primary Traits of Innovation Leaders
Joao-Pierre S. Ruth, Senior Writer,  11/8/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Getting Started With Emerging Technologies
Looking to help your enterprise IT team ease the stress of putting new/emerging technologies such as AI, machine learning and IoT to work for their organizations? There are a few ways to get off on the right foot. In this report we share some expert advice on how to approach some of these seemingly daunting tech challenges.
Slideshows
Flash Poll