ValueError:could not convert string to float: 'Rural'

python
label_encoding

#1

Hello all.

I ran the LabelEncoder code from the Tutorial along with

X_train = train_data.drop(‘Loan_Status’, axis=1)
y_train = train_data[‘Loan_Status’]
X_test = test_data

from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

but when I run:

clf.predict(X_test)

I get ValueError:could not convert string to float: ‘Rural’ even though when I look at the dataframe the Property_Area variable was encoded.


#2

Hi @warrior,

Have you encoded the variables in test dataset too?

From the error that you have got, it seems like you have encoded the train dataset only. So encode the test dataset as well and even then if you get any error, feel free to ask here.


#3

Thanks @PulkitS. No, I hadn’t encoded the test set however now that I have I get this error:

ValueError: could not convert string to float: ‘LP002990’

Does this mean I should include Loan_ID in the list of variables processed with LabelEncoder?


#4

Hi @warrior,

Have you considered the Loan_ID variable while training the model? If yes, you can drop the Loan_ID variable before training the model. It is advised not to consider any ID variable while training the model. And once you drop the ID variable from the train dataset, drop it from the test dataset as well, since the variables in train and test dataset should be similar.

After dropping the ID variable from both train and test dataset, you will not get this error.


#5

Thanks again for the explanation. It worked.