Practical implementation of Predictive Models

classification

#1

Hi,
I have learnt a lot of prediction models mainly classification models. I have also built a classification model after a lot of analysis with my office Opportunity data and selected my model.

Now I need to know how to implement this model in a practical setting. I can’t seem to find a single resource where I can learn how to use this model practically.
Like new data will come up every week how to use the model and identify which opportunities will be win or Loss.

Any guidance please.I am really stuck now.

Thanks


#2

Hi @Sabby,

You can work on the hypothesis generation part, so that even if you get the new data you can extract the same features from them. This will reduce the number of features and will also make the model effective.

If you change the features regularly, you have to build the model again and again for every new dataset. Instead you can do feature engineering and extract the same feature every time you get new dataset. This will help the model to predict the outputs effectively.

Here i have assumed that the new data contains totally different data points and different features.

If the new dataset have the same features as that of the trained model, you can load the previous model and train onto it.


#3

Hi Pulkit,
Thanks for the explanations. I think I am getting some of what your hinting at. Just bear with me for being naive but have some more queries.

Can you explain this “You can work on the hypothesis generation part,” in a little for detail and practical way, like what would you do in a real office set up?

I understand what you said and yes, the features will remain the same. My data source is a sql table. So it will be a combination of old data and new data added in the last week.

Also, taking this further. What do Data scientists actually do to implement once they have chosen a model.
Like are there any steps to keep on identifying predicted outcomes based on added data say weekly and present it in a PowerBI dashboard or any presentation for that matter.

Thanks,
Sabby


#4

Hi @Sabby
Well you have a model build on data which are similar to your new weekly data. Simple you build your model in let say R, you generate the RDS of the model and wrap in with interfaces, you are in power BI so I assume you work on Azure for example great you cae integrate R directly at db level check the microsoft documentation.

Now the input data drift from original from time to time, well first you build statistical test let say t-test for you near normally distributed or ks if you are not sure. You have a difference … well now you have to retrain your model, rebuild on RDS (if in R) and reintegrate. So usually we do do the test on regular basis perhaps not every week but this up to you and train the model automatically. This mean you have tests, metrics etc…

Hope this help.

Alain


#5

Hi @Sabby,

Hypothesis refers to a proposed explanation made on the basis of limited evidence. Just by looking at the problem statement we can make some interpretations based on our prior knowledge.
By hypothesis generation part, I meant that if you get new feature in your new data, you can make use of them to generate the existing features which will reduce your efforts of building the model again.
Once you have chosen a model, you implement that model and try to improve the performance of that model by different means. Possible ways to do so are:

  1. Tuning hyper parameters of the model.
  2. Combining more than one model to achieve better accuracy. This is called ensemble technique.

To learn further on ensemble technique you can refer this article.


#6

Hi Alain,
Thanks for the inputs.
As mentioned since I am very new I so not have all the terminology yet. Can you help me with the RDS bit.

Also we do not use Azure but have SQL server as DB and I have queries that import desired data from the relevant tables to the PowerBI file and then built the dashboards. Ii it possible to train and implement perhaps automatically to predict based on the new data or changes in the old data (not feature changes but only the data values).

Also if you can throw some light on how to use t-tests in this regard. Maybe you can direct me to some relevant posts.

I know my questions may be a bit basic. I guess I need a mentor very badly :slight_smile:

Thanks,
Sabby


#7

Hi @Sabby
sorry for the jargon in R the Rds is a compress format, which allow you to store model (actually anything) and to load fast.
Power BI is one other story it is not very dynamic, I faced this issue (thanks to my colleague who took time to explain) recently. What we do do is quite basic in a way we prepared some results based on the Power BI input choices (our case) then if new data (values) we rebuild the results based on the input range again.
Not top but for what was explained to me Power BI has very limited capabilities concerning dynamic interaction and you do not (I did not either) have azure then it is more batch processing.

Hope this help.
Best regards
Alain