Machine Learning Workflow: A New Product Category Is Born - InformationWeek
IoT
IoT
DevOps
Commentary
7/12/2018
01:00 PM
Moshe Kranc, Chief Technology Officer, Ness Digital Engineering
Moshe Kranc, Chief Technology Officer, Ness Digital Engineering
Commentary
50%
50%

Machine Learning Workflow: A New Product Category Is Born

Developing and deploying software based on machine learning is a very different animal in terms of process and workflow.

Machine learning (ML) is being touted as the solution to problems in every phase of the software development product lifecycle, from automating the cleansing of data as it is ingested to replacing textual user interfaces with chatbots. As software engineers gain more experience in developing and deploying production quality ML solutions, it is becoming clear that ML development is unique compared to that of other types of software.

The ML engineer creates experimental models, runs them on small samples of data, and shares those models with domain experts and data scientists for feedback, using notebook tools like Jupyter or Zeppelin. Once the team has decided on a model that is worth scaling, the next step is to ingest, cleanse, and de-duplicate the data. Once cleansed, the data is divided into training data, which will be used to tune the model, and validation data, which will be used to validate the model.

The ML engineer trains the proposed model by feeding it large volumes of data. To accelerate this process, the training is run in parallel across many processors with the intermediate results combined at the end of the process. This phase can be iterative and may require tweaking the model and then re-starting the training. The training may also need to be re-run at regular intervals after deployment to update the model or isolate a problem. This requires rolling back not just the model, but also the training data, a feature that traditional source control systems are not designed to handle.

The team then tests the accuracy of the trained model by running data through it and comparing the model’s predicted results with the actual results. Once the team is satisfied with the trained model’s accuracy, the model must be integrated with the target application and deployed on a scalable infrastructure so that it can respond to requests in production. Depending on the type of model and the deployment environment’s performance requirements, this may require mechanisms such as horizontal scaling, caching results and/or deploying parallel versions of the model in multiple containers.

Another distinguishing characteristic of ML software is that it is far more brittle than traditional software. ML algorithms are non-deterministic in nature and are highly sensitive to the characteristics of the data with which they were trained. If those characteristics change, the model may lose its accuracy and need to be replaced by an alternative model. Another cause of ML software’s brittle nature is the fact that every step is tightly dependent on every other step, so the norm is “Change Anything Changes Everything.”

To meet these challenges, many engineering teams have taken existing open source tools and wired them together to create a “roll your own” ML operational environment, using tools such as Jupyter (ML notebooks), AirFlow (data pipelines), Docker (containerization), and Kubernetes (container orchestration). But, for some teams, the potential costs and complexity of this approach may not be a good fit. As an alternative, a new category of products has emerged that provide an end-to-end ML operational environment. Products in this category include:

Amazon SageMaker: a fully-managed platform that enables developers to easily build, train, and deploy machine learning models at scale.

Yhat ScienceOps: an end-to-end platform for developing, deploying and managing real-time ML APIs.

Pachyderm: an environment that automates all stages of developing machine learning pipelines.

These products can vastly simplify the process of creating and deploying ML algorithms with a few caveats:

  • These products enable a ML team to deploy a ML algorithm in production. The question must be asked: Is this desirable? Does the team have the requisite operational experience for dropping code into your production environment based on intermediation from an automated software tool?
  • These products are new and have some rough edges in areas like stability and performance (like any new product). A good rule of thumb: Always do a proof of concept to see how the product works in your environment.
  • If you adopt one of these products, you are locked into that product’s roadmap. So, they may speed up your initial time to market, but it can impact your flexibility down the road.
  • Many of these products have an open source version. But, if you intend to use the product in production, you’ll quickly discover that you need the enterprise version.
  • Some of these products may suffer from a lack of focus, as they try to expand and solve problems beyond the ML development process. Make sure whatever product you choose can provide the depth of capabilities you need.

ML is poised for explosive growth in the enterprise, and ML workflow environment tools like the ones described above lower the barrier to entry. It will be interesting to see how this product family matures in the coming months.

Moshe Kranc is chief technology officer at Ness Digital Engineering, a company that designs, builds, and integrates digital platforms and enterprise software that help organizations engage customers, differentiate their brands, and drive profitable growth.

 

The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
2018 State of the Cloud
2018 State of the Cloud
Cloud adoption is growing, but how are organizations taking advantage of it? Interop ITX and InformationWeek surveyed technology decision-makers to find out, read this report to discover what they had to say!
Register for InformationWeek Newsletters
Video
Current Issue
The Next Generation of IT Support
The workforce is changing as businesses become global and technology erodes geographical and physical barriers.IT organizations are critical to enabling this transition and can utilize next-generation tools and strategies to provide world-class support regardless of location, platform or device
White Papers
Slideshows
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll