How to approach Data when a given statement turns out to be False


I am working on Analytics Vidhya : HR analytics challenge.

One of the assumptions given is that Employees with KPI score >80% will be considered for promotion. But on cross-check the training data its found that employees who don’t match the above KPI criteria have also been promoted in the past.

So am I going to try to remove all such data point/ records /anomalies from the training data set and only include records / data points which meet the KPI criteria >80% for promotion.

Is there any other approach or a better technique to follow. Please suggest.


If those observations are not significant yes. But thinking about the problem how would your model know this in the future. That means going forward during prediction you model will almost always result into error for all such cases.

Hello Mohit,

1)See , if you are making the model and in past somebody is given promotion eventhough kpiscore is less than 80%,then this is case of outliers,so handle those variables as a outlier treatment ,if outlier is less than 3 % then remove those variables.or impute them with your assumption or try differnt methods for outliers.
2) make the model without considering the assumption because you are having the real data,so let algorithm learn from the whatever data you have and make predictions according to it, because the real data is far better than the manipulated data thats what you will get in real world.

© Copyright 2013-2019 Analytics Vidhya