Error "could not convert string to float" while running randomForest model in Python




While trying to run a randomForest model on a dataset in Python, I encountered an error saying

ValueError: could not convert string to float: source_class

where source_class is a column in my dataframe.

Why is this error coming and what is the solution?



Hi @adityashrm21,

Try to convert those strings into numeric classes. How? Please find below the code for the same -

from sklearn import preprocessing
def convert(data):
    number = preprocessing.LabelEncoder()
    data['Employer_Name'] = number.fit_transform(data.Employer_Name)
    data['Source'] = number.fit_transform(data.Source)
    return data


After this try running RF. Hope it will solve your problem. For any other modelling help refer here. It’s my github link.



I need to re-transform one variable into the original categories and number.inverse_transform() isn’t working. What shall I do?


Hi @adityashrm21,

In that case I assume that you are able to run your random forest. So I don’t know how to do this by using function, but it can be done by following steps -

  • Make a array of transformed variable from original dataset and transformed dataset and put them in a data frame
  • Make them unique, so you will have a table which contains the original variable and the numeric value the got transformed
  • Merge it with the final dataset on numeric variable

Hope this helps.