Three Tips for Laying the Groundwork for Machine Learning - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // AI/Machine Learning
Commentary
8/21/2019
07:00 AM
Mathias Golombek, CTO, Exasol
Mathias Golombek, CTO, Exasol
Commentary
50%
50%

Three Tips for Laying the Groundwork for Machine Learning

While machine learning may seem overwhelming and complicated, creating an infrastructure for ML projects is more achievable than many organizations think.

Machine learning has grown to have a significant impact on our daily lives: From Amazon’s home assistant Alexa collecting and analyzing information to anticipate our needs, or Facebook suggesting who we should friend, to applications protecting us from credit card fraud and improving online shopping experiences.

Organizations want their data to do the heavy lifting for them, driven by the desire to save on costs, improve consistency and streamline operations. While ML technologies were previously perceived as an excessive expenditure, today they are seen as an investment in the business’ future and a competitive revenue driver.

Image: NicoElNino - stock.adobe.com
Image: NicoElNino - stock.adobe.com

In order to stay competitive and successful, organizations have to invest in the right technologies and intelligently use the skills and data systems that they already have. The following three tips will help enterprises evaluate ML benefits and investments and make the most of the technology they already have. 

Get quality data and get it organized

For ML algorithms to offer informed judgments and recommendations on business decisions, the underlying database must provide a steady supply of clean, accurate, and detailed data. It’s important to rememeber that more data doesn’t necessarily mean better data. Quality always comes first. When the quality of data is low, insights derived from the data will be less valuable, as will be the decisions organizations make based on the data.

According to a 451 Research report, 22% of the companies surveyed have already implemented ML algorithms in their data management platforms, while 42% are planning to implement one in the next 12 months. This shift in investment, focused on ensuring captured data is of the highest quality possible rather than simply casting the data net as wide as possible, is a stark industry change. Less than a decade ago, dedicated data quality services and tools were a niche service and largely underused by data-heavy businesses. Now, they are front and foremost in the C-suite’s future plans.

As ML continues to progress, organizations need to ensure that they provide support for their data scientists and invest in the necessary technology to process ML algorithms. If data scientists do not have the correct resources, this momentum will falter. Organizations need to have a high-quality database as the first step in preparation for incorporating ML into their business processes.

Embrace Python

For many organizations, predictive analytics is a key motivator for investing in ML. Predictive analytics use ML to mine large datasets and predict the outcome of future events. This predictive analytics function depends on the data scientists’ mastery of the appropriate programming language. And just how does one master anything? By studying, experimenting and learning from others.

Here is where Python, one of the most popular programming languages in the world according to Tiobe Index, really stands out. Python has become popular mostly because of its simplicity, readability, versatility and flexibility. As millions of people around the world learn and use the language, more and more individuals and groups share programs, tips and entire algorithms with each other. Python’s network of users gives organizations hoping to use and experiment with Python countless learning materials right at their fingertips.

Ultimately, having one underlying data infrastructure that everyone across all teams can feed into and take from is the key. For the business intelligence team, this will typically be Structured Query Language (SQL). However, in order to succeed, data scientists must be able to run scripts on the data using their preferred language -- notably Python. This standardization and democratization of data means that organizations can apply ML across any and all parts of the business in more creative and experimental ways.

The benefits of hyperscale cloud

Despite on-premise IT infrastructure’s ability to host many open-source frameworks to create ML solutions, many organizations still lack the power and scalability to support them. If an organization is evaluating ML for a project, hyperscale cloud might be a good option to consider, since it offers consumption-based access to graphics processing unit (GPU) compute, which can dramatically accelerate the process of training a deep learning algorithm.

Once the requirement moves from batch analysis to real time, the flow of relevant data must keep pace with ML algorithms working in near real-time. Ensuring that workloads are supported throughout a project’s lifecycle and organizations have the ability to experiment with ML capabilities is essential, and cloud elasticity can be used to address that.

It has never been easier for organizations to expand into the cloud, as the big three public cloud providers -- AWS, Google and Amazon -- all fight for ML business. Despite this, organizations still lag behind in exploiting the elastic scalability of the cloud to derive value from their organization’s data with ML.

While ML may seem overwhelming and complicated, creating an infrastructure for ML projects is more achievable than many organizations think. In fact, most organizations are already using the technologies they need, such as databases, programming languages, and Infrastructure as a Service, to lay the foundation for ML optimization.

Mathias Golombek joined Exasol in 2004 as software developer, led the database optimization team and became a member of the executive board in 2013. Although he is primarily responsible for the Exasol technology, his most important role is to build a great environment, where smart people enjoy building products. 

 

The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
News
Python Beats R and SAS in Analytics Tool Survey
Jessica Davis, Senior Editor, Enterprise Apps,  9/3/2019
Slideshows
IT Careers: 10 Places to Look for Great Developers
Cynthia Harvey, Freelance Journalist, InformationWeek,  9/4/2019
Commentary
Cloud 2.0: A New Era for Public Cloud
Crystal Bedell, Technology Writer,  9/1/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Data Science and AI in the Fast Lane
This IT Trend Report will help you gain insight into how quickly and dramatically data science is influencing how enterprises are managed and where they will derive business success. Read the report today!
Slideshows
Flash Poll