How to approach Data when a given statement turns out to be False


I am working on Analytics Vidhya : HR analytics challenge.

One of the assumptions given is that Employees with KPI score >80% will be considered for promotion. But on cross-check the training data its found that employees who don’t match the above KPI criteria have also been promoted in the past.

So am I going to try to remove all such data point/ records /anomalies from the training data set and only include records / data points which meet the KPI criteria >80% for promotion.

Is there any other approach or a better technique to follow. Please suggest.


If those observations are not significant yes. But thinking about the problem how would your model know this in the future. That means going forward during prediction you model will almost always result into error for all such cases.

© Copyright 2013-2019 Analytics Vidhya