Operations on train data vs the test data

hackathon

#1

I am treating outlier in Item_Visibility and found that when you grouped Item_Visibility by Item_Type it will have outlier. I replace all outlier by min-.5%tile max-95%tile in the group. now training data Item_Visibility have no outlier.

Now If I want to remove outlier from test data.
What will be removing criteria? Do I need to use training data set min and max value or have to use min max value of test data?

Same Question in other way.
Some time we add frequency count feature of ID. Now when it comes to test data do we need to calculate it separately for training and test or we have to join training id with test ID.

Or Do I need to merge training and test data and then remove outlier.

Please help.


#2

i need dataset usaidus123@gmalil.com


#3

I also need dataset. jideade2069@yahoo.com


#4

Hi @jideade @balghari

You can download the dataset from here:


#5

Hi @BhanuPratap

In order to avoid repetition of the steps, you can merge your train and test sets and treat outliers on the complete set.