If you are a Data Scientist, you have probably suffered the lack of version control syndrome. As a result, you end up with many Jupiter notebooks that are just different versions of the same primary project. But, then, the problem extends into APIs as well. Once you decide to deploy a model, you never know what API you should use and what it corresponds to.
In the end, version control ends up being a big pain point in big to small organizations.
Of course, GitHub or GitLab is the first thing that comes to mind, and although that solves a big part of the problem, it does not solve all of the issues.
First, you don't know what model is currently being used in production. Not knowing this creates a disconnect between Data Science and XOPs. Learn more on XOPs. Ideally, you'd have a real-time understanding of your model's version and where that model is sending data to.
Second, although you've neatly organized your model versions, moving from the latest version to older versions in production can be challenging. This disconnect requires Data Scientists to remove commits or upload the latest version and then manually deliver the models to the XOPs team.
Third, Github is not a good friend of collaboration with business units. Very few teams know how to use Github and access the descriptions of each model. Understanding what the model does, what parameters it takes, and its ultimate goal is vital.
Fourth, API organization is not possible on Github or GitLab. Organizing and finding APIs is starting to be a headache for companies. Understanding the purpose of each API and guessing what APIs are actually in production is an absolute necessity.
We've spent a lot of time thinking about how we can solve all of these issues. So here's a step by step on how we can help you and your organization nail version control in Data Science:
- Model upload and versioning: We built a feature that lets you choose and upload a model. Additionally, you can add versions to it. Moreover, you can decide to change the default version at any time and add comments to your model's version. Furthermore, you can see if the model is IN USE or not. Having that visibility allows you to have a clear view of the models that are currently operational.
- Choose your model version in production: Once in our Pipelines tool, you can choose the version you would like to put into production when the Model Operator is selected. You can move from different versions as it best suits you.
- Tags: You can organize your models with Tags. That way, you can quickly locate your models, and your versions as the number of models grows.
- APIs: As you create APIs with our API exporter or SQL API, you can create tags to organize them and quickly locate them. That way, your API usage scales teams will be able to manage APIs by objective, business units, and many more.
As data analytics evolves, our goal is to ensure that the gap between data teams and the rest of the organizations closes more and more as new challenges arise. With our set of tools, teams can have total control over their modeling and data workflows, while delivering tangible results to stakeholders in minutes.