The Basics of CI/CD for Data Science and Machine Learning - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management
08:00 AM
Pierre DeBois
Pierre DeBois

The Basics of CI/CD for Data Science and Machine Learning

Continuous integration and continuous deployment are IT practices that encourage testing code often. Learn how these practices also shape data-driven initiatives.

The basics behind how machine learning and data science should work often feel less than basic. Machine learning practitioners from programmers to scientists are learning how to apply advanced statistics and mathematical application within the context of software programming. The result is complexity in selecting good machine learning models that conflict with management’s options at hand, be it objectives deadlines or limited resources to execute a decision based on the model.

Fortunately a few developer practices -- continuous integration and continuous deployment (CI/CD) -- are providing managers with ways to lead machine learning and data science initiatives early in a development process, making truly beneficial model-based decisions possible.

Let’s look at the definition of CI/CD to understand how the paired processes impact machine learning.

Continuous integration is a practice that ensures that code and any related resources are placed into a shared repository at regular intervals of time. These check-ins are next verified using automated builds, helping to highlight any problems early in the development cycle.

Continuous deployment is a practice in which software updates are built automatically, tested, and made ready for release. With developers and database teams working collaboratively and in parallel, continuous deployment paves a way for stable and consistent versions of software.

Image: Shutterstock
Image: Shutterstock

CI/CD is valuable because today’s business strategies have become reliant on how the ongoing nature of software management impacts the development of products and services. The consequential agility needed to deliver functional software has transformed the software itself into microservice architectures. Microservices are a set of development techniques that arrange an application as a set of coupled services. Maintaining microservices permits software releases to be deployed frequently, even multiple times a day, without interrupting other software segments. The advantage to a business model is being able to provide seamless updates.

The seamless updates of microservices can also complement data-related changes, such as adding software updates that meet privacy compliance needs with any associated data. The update capability allows data science and machine learning processes to be incorporated into CI/CD phases at the right time.

As a consequence, CI/CD-influenced projects have the opportunity to minimize technical debt, the tendency to overfocus on code syntax without considering the long-term consequences to programming maintenance and its impact on the business model. For example, a team could develop an app, but not examine the steps needed to update the environment in which the app operates. Technical debt is the enemy of organizations that have multiple deployment environments (e.g., development, testing, production). Technical debt is also the enemy of data-driven initiatives, since data deployment environments are demonstrating similar concerns that arise in software development, such as API documentation -- in this case from data resources -- as well as different data types. Getting an overview of needed data mining and transformation steps can become complex very quickly.

So where within a development process can managers contribute to a CI/CD process to help simplify the complexity? One great opportunity is through evaluating test processes like user acceptance testing (UAT), a test phase that evaluates user needs, business requirements, and software functionality. Managers can help the test team set the evaluation parameters for business requirements, leading to a robust methodology for evaluating continuous improvement of those parameters. A project manager is usually assigned to work with developers on this effort. UAT can be effective in reducing development time and expenses, while CI/CD can inform data management on how development of a model output can potentially impact customer experience with a service or product.

Experts indicate other opportunities for managers to apply CI/CD practices are emerging. Ben Lorica, chief data scientist at O'Reilly Media, noted in his O’Reilly Strata conference keynote that tools specialized for machine learning will layer onto existing analytics. The trend will allow teams to increment their capabilities and experiment with other architectures. Recent announcements by Microsoft Azure, Amazon Web Services, and Google, for example, emphasize faster model training, better workflow management, and greater security for project deployment.

Evaluating the programming used for those projects can aid in selecting complementary IDEs and regular needs among teams. If a team had used R programming to develop models, for example, a version control system would be needed to keep packages and dependencies updated and a documented history on changes that drives decisions among the responsible teams.

All of these considerations can enhance how well a CI/CD workflow complements the time machine learning algorithms take to train on the data and return results for inspection.

Turning data into a valuable business decision is not simple. But as data transformations increasingly occur in applications and software-managed devices, managers are experimenting with software management techniques like CI/CD to keep complex machine learning models in step with good data management basics.

Pierre DeBois is the founder of Zimana, a small business analytics consultancy that reviews data from Web analytics and social media dashboard solutions, then provides recommendations and Web development action that improves marketing strategy and business profitability. He ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
CIOs Face Decisions on Remote Work for Post-Pandemic Future
Joao-Pierre S. Ruth, Senior Writer,  2/19/2021
11 Ways DevOps Is Evolving
Lisa Morgan, Freelance Writer,  2/18/2021
CRM Trends 2021: How the Pandemic Altered Customer Behavior Forever
Jessica Davis, Senior Editor, Enterprise Apps,  2/18/2021
White Papers
Register for InformationWeek Newsletters
The State of Cloud Computing - Fall 2020
The State of Cloud Computing - Fall 2020
Download this report to compare how cloud usage and spending patterns have changed in 2020, and how respondents think they'll evolve over the next two years.
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you.
Flash Poll