Value Error : Features

While fitting model on the test dataset there is error which says as:
Value Error : Set has ‘X’ no. of features expecting ‘Y’ no. of features.

@rishu4398, Please share a screenshot of the first few rows of both - training and test data. or share the files here itself if you can.

Test_wyCirpO.csv (623.5 KB)

Sorry for the late reply!!
I was trying to upload my .ipynb code file but it is not supported.
Should I upload screenshot of the error?
And I am uploading train file in next comment as it is not allowing me to post 2 files in same comment.

Train_pjb2QcD.csv (1.2 MB)

Hi @rishu4398,

The datasets have same number of features, with what I see here. The problem is maybe that you are using the target variable with the train data itself, or while creating dummy variables, the number changed.

Please check the column names that you have for train and test dataset right before you fit the model. So you can use train.columns command and similarly for test and check that. or you can use the train.shape to see how many columns are different in both the case.

Once you get the column names, you can compare which column is missing in test and work accordingly.

I have checked that issue but the error is still there.
Exact description of the error is mentioned in the screenshot attached!

your model expects 17670 features, whereas your test has only 10394 features.
they must be equal.

@sharoon that’s exactly what I am asking that how to solve this issue?

intuition says to apply EXACT same data pre-processing on both the train set and the test set.

Somewhere in your code, the train set and test set are diverging.

Edit: It is recommend that you remove the target variable altogether while preprocessing.

@rishu4398, please combine the train and test data, and then create dummy variables. So here is a possible explanation. Suppose you have the following train and test data

train data:

ID feature 1   feature 2 
1    A            X
2    B            Y
3    C            X
4    A            X
5    B            Y

test data

ID feature 1   feature 2 
6    A            X
7    B            Y
8    A            X

In this case, when you apply dummy encoding on train and test separately, the train would give you 5 features and the test would give you only 4 (since the test does not have any instance with the class C for feature 1.

So, concatenate the datasets, after preprocessing, separate them

@AishwaryaSingh Thank You!!
This solution is working.Now I know what I was doing wrong.

@rishu4398, Glad it worked! :slight_smile:

© Copyright 2013-2019 Analytics Vidhya