The keyword “machine learning” is around all the time, you keep hearing it and people tell it like magic that solves everything and as a leader, you want to solve your problems in your business. You started hiring data scientists and asked them to do X, you keep hearing they are having “very good accuracy results in the test data set”, you like what you hear; you have heard that is something good. After some time, you wanted to use it in your day-to-day operations and it just started failing. So, what is happening? What is going on?
Of course, there can be multiple reasons to focus on but in this article, I will focus on three non-technical points:
- Not being able to separate the problem and the solution
- Not using the correct metrics
- Not iterating
Separate the problem and the solution
The previous paragraph has the following sentence:
You started hiring data scientists and asked them to do X …
How did you come up with the idea of X? Are there any other Xs that the data scientist team can help with? Probably, X is the solution you think is the best for your problem but did you consider other solutions? Before moving forward, let’s explain what a problem is and what a solution is in a very very short way.
- The problem is what you want to achieve
- The solution is how you want to achieve
And, let’s give a problem-solution relation with this example:
I want to increase the conversion rate on my e-commerce website by developing a recommendation system.
Your business problem (what you want to achieve) is to increase the conversion rate and the solution you think is the best is to develop a recommendation system.
There can be multiple solutions to the same problem, you can
- build an ML-based recommendation system, or
- improve the quality of your website’s search engine.
There can be more than one way to fix your problems. Data scientist team may tend to jump into the first solution they have in their mind or business leader may tend to ask data scientist team to implement the solution she heard that a big co does. So, what to do here?
Actually, the solution is very simple; as a leader, you need to explain what you want to achieve, and then you need to brainstorm solutions with your team. Of course, there will be challenges and the challenges here may be;
- the data scientist team need to think more about business than being only technical,
- business leaders need to be more open and clearly explain the situation of the business and understand the machine learning process.
Using the correct metrics
Let’s continue using the same “increase conversion rate + build a recommendation system” example. If you just come up with the idea of “let’s build a recommendation system” and you measure the performance of your data scientist team by measuring the conversion rate, that could be wrong.
The edge case of this example is; measuring the revenue of the company as a metric to evaluate the performance of the data scientist team. As you can easily imagine, the revenue is not only tied to the recommendation system results. So, what to do here?
Build intermediate metrics and measure them.
metric 1 -> metric 2 -> metric 3 ->… -> final metric
In the recommendation system example; the immediate metric that the data scientist team can measure is the click-through rate (CTR) which measures how many of the recommended items has been clicked on. The next level metric can be the add-to-basket rate which can be used to measure how many of the recommended items has been added to the basket. And, it goes on like that …
CTR -> add-to-basket rate -> metric x -> metric y-> conversion rate
You just need to know that the more you slide to the right, the less the data scientist team will have confidence. As a leader, you need to find the correct balance between the business goals and the metrics that the data scientist team can have confidence in and has an impact on.
Iterating
Irarely see an idea just works on the first try. Machine learning projects are also not an exception. Developing a machine learning project is a highly iterative process and excepting it to have a successful final result on the first try is a mistake. The data scientist team may have good accuracy results on the test data set but when it comes to the real world, the game changes.
There can be many problems & solution options when it comes to applying machine learning in real-world;
- if it is a recommendation system, a process can be developed to feed the clicks to the recommendation engine so the recommendation engine can adjust itself,
- if it is a computer vision application, maybe, simply the light conditions in the factory are not good enough and the images that are fed to the vision system is very dark,
- maybe the user behaviors changed and the data distribution that the data scientist team used to develop the machine learning system is not valid anymore; that is what happened to the credit card fraud detection systems when COVID hit the world because everyone started using their credit cards on websites for shopping and it was a sharp change.
So, what to do here?
Accept that developing machine learning projects is a highly iterative process. Build a process that the data scientist team will get feedback from the production system. Do not see machine learning projects as a technical problem, machine learning brings changes to the organization and may require the collaboration of many moving parts in the company.
Conclusion
In this short article, I tried to explain three non-technical points that may cause machine learning projects not to produce value and I also mentioned the possible challenges.
- Separating the problem and the solution is the key to find the correct machine learning project to implement. This can be achieved by brainstorming, but this comes with a challenge which is data scientist teams and business leaders need to understand the other side.
- Using the correct metrics is the key to measuring the project and the team. This can be achieved by finding the intermediate metric, but this comes with a challenge which is data scientist teams may have less confidence if the metrics become blurry for them.
- Iterating is the key to creating a bridge between the real world and the data sets that the data scientist team works on. This can be achieved by building a process that feeds the data scientist team from production/real-world data, but this comes with a challenge which is business leaders may not see this as an organizational change and prefer to see it only as a technical problem.