How do I develop a system to Recommend a marketing channel using data science?



I am working on a data science challenge. I would like some help on how to proceed in developing my solution… A snapshot of my dataset is shown below:

This dataset is collected by a drug making company trying to sell its drug to doctors of different specializations.

The drug company has made promotional activity for its brand

The promotional activity has included calls made, emails sent and faxes sent. The dataset shows how many calls have been made (‘Calls Made’ column), how many calls were successfully completed (‘Calls Successfully Completed’ column), how many emails were sent (‘Emails Sent’ column), how many emails were opened (‘Emails Opened’ column), how many faxes were sent (‘Faxes Sent’ column). Brand 1 is the brand of the drug company. Brand 2 is the brand of its competitor.

The dataset shows how many sales have been made for various brands. Brand 1 Sales (Company’s brand) - number of sales of the drug company’s brand Brand 2 Sales (Competitor brand)- number of sales of the competitor’s brand Total Branded Market Sales - number of sales of the all brands (from all companies selling branded drugs) in the market Total Market (Branded + Unbranded) sales - number of sales of all branded and unbranded drugs

My task is to develop a ML system using this dataset that will recommend which of the three channels to use (calls/emails/fax) for future promotional activity. How do I develop such a system? My understanding is that this is a classification problem with three classes (calls/emails/fax). For a prospective customer, I have to classify him/her into one of the three classes that would indicate what channel to use.

What kind of feature engineering should I do? What model can I use?


I have developed a solution for this problem. I clustered the dataset into three clusters (call, email, fax) and assigned each customer categorized into a cluster to the corresponding marketing channel. I calculated the channel affinity score using the distance of the customer data point to the cluster centroid.

The solution is uploaded to a github repo:

It contains a jupyter notebook containing the python code for the solution, the dataset and a pdf report explaining my approach.