So your company made the leap and now machine learning is part of your arsenal. The data team is building models left and right, outcomes are making sense and everything is looking great to put into production, at last, to operationalize your team’s efforts to make everyone in the conference room recognize this is just what the company needed to increase revenue.
You thought building models was the hardest part, but suddenly you’ve been swept from under your feet and landed on your back with a hard thump. The blow came from finding out that putting these models into production is the real hard part of the process.
It’s not like the data is not out there. Gartner reports that on average, 85% of models don’t make it into production due to how difficult it is to operationalize them.
So, what is MLOps?
MLOps is the art and science of bringing machine learning to production, and it is necessary to ensure that your data team is not part of the 15% of companies that fail at machine learning.
Due to an ambiguous definition of the MLOps practice, Wiki provides the closest one by stating that MLOps or ML Ops is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. The word is a compound of “machine learning” and the continuous development practice of DevOps in the software field. Machine learning models are tested and developed in isolated experimental systems. When an algorithm is ready to be launched, MLOps is practiced between Data Scientists, DevOps, and Machine Learning engineers to transition the algorithm to production systems. (Wiki)
But this raises the question of why not continue using DevOps to operationalize machine learning? The reason is that there’s a significant dichotomy between ML and traditional software:
ML is not simply coding, it is code plus data.
An ML model, the file that you need to put into production, is created by adding an algorithm to a set of training data, which will affect the behavior of the model in production. Additionally, the model’s behavior also is dependent on the input data that it will receive during a prediction, which you can’t know in advance.
Challenges
The issue is that in an ML process, data and code live in the same plane, and eventhough code can be controlled in a development environment, data is always changing due to the multiple sources it is extracted from. The job of ML is to create a connection between data and coding in a controlled way.
The main challenges teams face while deploying an ML model into production are slow, brittle, and inconsistent deployment, lack of reproducibility, performance reduction (training-serving skew).
Practices to reach MLOps
To successfully deploy ML models, companies have incorporated what are called hybrid teams, which are groups of specializations like DataOps, Machine Learning, and Data engineering which working together provide the set of skills necessary to get the job done. In some cases, one specialized person called an ML Ops engineer could solve these problems. But in most cases, it is necessary to incorporate a team of a Data Scientist or ML Engineer, a DevOps Engineer, and a Data Engineer.
This set of specializations are not set in stone. But what is most important is that these individuals work together to achieve the goals set by your company.
It is also important to highlight that a Data Scientist will not be able to achieve the goals of ML Ops. To make a model work, code modularization, reuse, testing, and versioning are necessary. This is why many companies are incorporating in some cases an ML Engineer, which brings in these skills. In many cases, ML Engineers are, in practice, performing many of the activities required for ML Ops.
The advantage of having a hybrid team is having the freedom of running and deploying models at your fingertips, but as stated above, it requires a well-rounded team of specialists which could incur high costs and extended hours working on a single project. This could tamper with the other departments relying on the ML Ops team to reach periodical goals.
ML Pipelines
A pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in a time-sliced fashion.
Specialized tools like Datagran help create, manage and run these pipelines. And some of the benefits include code reuse, run time visibility, management, and scalability. In this example, Dominos used Datagran to make predictions and put models into production. The process to operationalize models for them used to take months, but with Datagran’s pipelines, they were able to reduce it to minutes. This not only gave their data team more time to focus on what was truly important, but it dramatically increased efficiency.
Before, there used to be a need for two distinct ML pipelines: the training pipeline and the serving pipeline. What they have in common is that the data transformations that they perform need to produce data in the same format, but their implementations can be very different. For example, the training pipeline usually runs over batch files that contain all features, while the serving pipeline often runs online and receives only part of the features in the requests, retrieving the rest from a database.
With Datagran, there is only a need for one pipeline due to the ability to configure the integrations, training of the data, predictions, and the model’s outcome with operators and actions.
Now, to be able to have reproducibility, the team needs to have consistent model version tracking. In a conventional software environment, versioning code is sufficient since it defines all behavior. In machine learning, you must additionally keep track of model versions, as well as the data needed to train it and certain meta-data such as training hyperparameters.
What is typically done in this scenario is using Git to track models and metadata in a standard version control system. The problem is the data is too large affecting efficiency and practicality.
A soon-to-be-released feature in Datagran provides Data Modeling import which will enable teams to upload model versions, keep track of schedules, version data, and connect trained models to the exact versions of code, data, and hyperparameters that were used.
There is certainly a lot to learn about ML Ops and its evolution. But what is important is to keep your company on the edge of innovation to avoid falling behind opportunities that your company can take advantage of to reach its goals. Learn more about Datagran to build ML models and put them into production, fast.