Best Practices for machine leaning and Deep learning's implementations

Hi Experts,

Please point me to or drop Best Practices for machine leaning and Deep learning’s implementations

Thanks,
Tony

You might want to be a bit more specific as there are many applications of machine learning and deep learning, and so there is no way I could point to a “best practice” for two very large, and note entirely similar, fields.

Hi Team,

Machine Learning Best practices

In order to implement machine learning algorithms correctly, organizations need to execute best practices. These are 10 things you need to take care of when building ML models and applications

1. Identify the business problem and the right success metrics

Starting with a problem is a common machine learning practice, precisely because it is necessary. However, people often make the mistake of de-prioritizing it. It is also essential to establish and execute success metrics. Beyond that, you need to make sure that the success metrics you establish are the right ones.

2. Begin with it

At this juncture in machine learning, a lot of people fail and never get started at all. This is due to various factors such as people pushing too hard to get everything just right, technology is complicated, or the buy-in is not there. It is recommended to actually get started, even when you know that you will have to recreate the application or model once a month. There is no value to the learning you will be gaining from this.

3. Gather correct data

Classifying everything you possess and deciding what is important is not the right way to proceed on with. The right way is to map out the required data to formulate models and investigation and work reversely from the solution. Along with actually getting started, it is very important to assemble the right data, for your success. In order to determine the right data, you require interacting with people associated with various business domains.

4. Move the algorithms instead of your data

What usually happens is that people take all of their data out from the database to run their equations with their model. This results in the importation of the result back to the database to make those predictions. The process takes hours and days, hence, reducing the efficiency of the build models and applications. However, increasing the equation from the database has its own significant advantages. Running the equation through the kernel of the database consumes less time when compared to the hours it would take in exporting the data. It also builds it inside the database and does all the maths involved within. When you keep your data within the database, you can build applications and models and score within the database. You can also use R packages with parallel-data invocations. This helps in avoiding data duplications and separation of analytical servers. It allows you to prepare data in just hours, score models, build models and applications, and embed data prep.

5. Initiate tests before the actual launch

Carrying out tests will help you know if you are going right on the track or not and make you feel more confident on the created model or application. Along with testing, you must also have provisions planned if any sort of issue arises.

6. Avoid data dropping while machine learning algorithms train

When there is an accumulation of a lot of data, an organization is tempted to drop the unnecessary files. However, dropping these files, while training the machine learning algorithm, can cause various issues and problems.

7. Keep away from objectives that are unaligned

Your team should focus on the issues that are outside the scope of your default system, especially, when reviewing the performance of your machine learning system. You must replace the product goals or objective if your objectives and goals are not achieved by the existing algorithm.

8. Keep using codes

Make the use of codes on regular basis between your serving and training pipeline. Serving involves online processing and training is a batch processing task. In order to use the code, build an object which is specific to your system. You must store the results of any sort of query in an easily readable way. Once you collect all the information, while training or serving, you should be capable of running a common method for bridging between the easily readable object and the expectations of the machine learning system.

9. Use a simple model for ensemble

The unified models are the easiest models that can be understood and debug. However, ensembles of models work best when it comes to simplicity. If you want to keep things simple, your model must either be a base model, be an ensemble or only take the input of other models. Combining models that have been trained separately can result in bad behavior. For ensemble, you must use a simple model that only receives the inputs of your base model.

10. Metrics analysis

Pick an offline optimization metric which correlates the product goals and objectives. Often, a good representative for the objectives of the product can be an online A/B test result. You get the results related to the correlation of the metric through tracking offline metrics and running various experiments. A metric should be easily understandable and interpret which would allow you to easily compare different models. Tracking a metric is a good idea to keep a track on the per-user segment i.e. locales, new users, very active users, stale users, and more. Additionally, avoid measuring your metric test on the non-validation and non-training test sets.

Moreover, tracking metrics offline provides you with the sense of how much change has taken place between your new and the existing model ranking. In the majority of cases, you might require to track multiple metrics to find a way to make a balance out of them.

These are the best practices that should be considered for machine learning models and applications. A good data is a must. Placing it in object storage or in a database is of more importance. A deep understanding and knowledge of the data are required along with the clear picture of how to use it and what to do with it.

Please let me know your thoughts

1 Like
© Copyright 2013-2019 Analytics Vidhya