Getting "TypeError: '>' not supported between instances of 'str' and 'float'"

Getting “TypeError: ‘>’ not supported between instances of ‘str’ and ‘float’”
in below code snippet

from sklearn.preprocessing import LabelEncoder
var_mod = [‘Gender’,‘Married’,‘Dependents’,‘Education’,‘Self_Employed’,‘Property_Area’,‘Loan_Status’]
le = LabelEncoder()
for i in var_mod:
df[i] = le.fit_transform(df[i]) ## typeError at this line
df.dtypes

Hi @plarion,
There may be missing values in the dataset. Treat it before applying label encoder

3 Likes

Thank you @jalFaizy, missing value was the issue.

I am getting the following error for this particular code:

Code:

from sklearn.preprocessing import LabelEncoder
var_mod = [‘Gender’,‘Married’,‘Dependents’,‘Education’,‘Self_Employed’,‘Property_Area’,‘Loan_Status’]
le = LabelEncoder()
for i in var_mod:
df[i] = le.fit_transform(df[i])
df.dtypes

Error:

Note: i did impute all the missing values

I would say, get count of all columns to see if any column still has NULL values.
i.e df[“Gender”].value_counts()

No NULL values in the data still facing the same issue

df[“Dependents”].value_counts()
0 345
1 102
2 101
3+ 51
Name: Dependents, dtype: int64

df[‘Dependents’] = le.fit_transform(df[‘Dependents’])

TypeError: ‘>’ not supported between instances of ‘str’ and ‘float’\

Any help will be appreaciated

it is because of 3+
better map manually using map({‘3+’:3})

1 Like

@T_Predict | anyone : more details on using map function pls

I tried doing this after getting the error-

TypeError: ‘>’ not supported between instances of ‘str’ and ‘float’

In [152]:

df[‘Dependents’].value_counts()
df[‘Dependents’]=map(float(‘3+’),3)
#df[‘Dependents’].map({‘3+’,3})
df[‘Dependents’].value_counts()
#df[‘Dependents’] = le.fit_transform(df[‘Dependents’])
#df[‘Dependents’]

ValueError Traceback (most recent call last)
in ()
1 df[‘Dependents’].value_counts()
----> 2 df[‘Dependents’]=map(float(‘3+’),3)
3 #df[‘Dependents’].map({‘3+’,3})
4 df[‘Dependents’].value_counts()
5 #df[‘Dependents’] = le.fit_transform(df[‘Dependents’])

ValueError: could not convert string to float: ‘3+’

missing value is the issue

Hey, you should have used
df['Dependents'] = df['Dependents].map({'3+': 3})

For some reason executing the above makes the remaining as NaN. After running the above code execute df[‘Dependents’].isnull().sum()

Though not really sure but the only way i found a workaround is by exporting to a csv and reading the same back. After that LabelEncoder works absolutely fine.

df.to_csv(‘F:/Datasets/Loan_Pred_traintest_cleaned.csv’, index= False)
df = pd.read_csv(‘F:/Datasets/Loan_Pred_traintest_cleaned.csv’)

If i find the reason, i will post it here.

It is due to the missing values in the columns gender,married and dependents.
use df.count() to check which columns have missing values and fill them and then use label encoding it will work fine.

I was getting a similar issue but in a different context , but this solution gave me a hint to check for missing stuffs. My list had some ‘None’ in between , hence list sorting was throwing similar error.

Now whats the problem?

Hi @Freak, drop the ID variable from the train and test, then train the data.

1 Like

Hey @AishwaryaSingh thnx for the response.
Can you enlist the steps so that I can move further, like 1st load the train and test dataset then drop the id variable and all.
I am unable to find the approach to it.

Hi @Freak,

Here is a free course that solves the loan prediction problem, covering all the necessary steps. I suggest that you go through this:

1 Like

thnx a lot

Call df[‘Dependents’].unique(). I’m guessing you will see some values with quotes around them indicating they are strings and other values without quotes indicating that they are integers. If this is the case I would recommend converting the column to a string prior to using the encoder.

© Copyright 2013-2019 Analytics Vidhya