I have a question on a customer classification prediction problem (correct me if I am wrong in terms of the problem type). The problem is as follows:
- Marketing team wants to optimise campaign to only target potential customers (to save cost and time)
- Dataset contains all customers who are eligible for the product in the campaign (A), including demographics, and other customer attributes
- A test campaign has been run on a random subset of customers from A (let’s call this B)
- In the end, dataset records who signed up from B and C (C is customers who were not part of B - they signed up without being targeted)
- Objective is to identify who should be targeted and who should be left alone (either because they would sign up without being targeted or they would not sign up even if they were targeted)
My understanding is as follows:
- My training data should be all the customers who have signed up to the product offer
- From this, I can then derive the features that are relevant
- Finally, my model should classify which customers should be targeted and which should be left alone
- Is this the right approach? Or is it not a classification problem by nature?
- When selecting a model, we should do some exploratory analysis first to understand what’s in the data, right? How does it relate with selecting which machine learning model to use?