Conversion of Categorical var to Numerical Throws error with following Code

pandas

#1

I was trying to convert loan_status - Y or N to binary 1 and 0 and then imputing missing values using Python.I used the code below and it’s throwing error.Could you please check.

def coding(col,codeDict):
    colCoded = pd.Series(col, copy = True)
    for key, value in codeDict.items():
        colCoded.replace(key,value,inplace=True)
        return colCoded
#Coding  Loan_Status as Y=1,N=0
print ('Before Coding')
data['Loan_Status'].value_counts()
data['Loan_Staus_Present'] = coding(data['Loan_Status'],{'N':0,'Y':1})
print ('\n after coding:')
print (data['Loan_Status_Present'].value_counts())

Returns Output data with N as 0 and Y as Y,where as it should be 1.

Thanks,
Bijay


#2

@Bijay,
The problem is at the highlighted line :

 def coding(col,codeDict):
    colCoded = pd.Series(col, copy = True)
    for key, value in codeDict.items():
        colCoded.replace(key,value,inplace=True)
 ----->return colCoded
#Coding  Loan_Status as Y=1,N=0
print ('Before Coding')
data['Loan_Status'].value_counts()
data['Loan_Staus_Present'] = coding(data['Loan_Status'],{'N':0,'Y':1})
print ('\n after coding:')
print (data['Loan_Status_Present'].value_counts())

Put the return statement after the for loop :

def coding(col,codeDict):
    colCoded = pd.Series(col, copy = True)
    for key, value in codeDict.items():
        colCoded.replace(key,value,inplace=True)
 -->return colCoded
#Coding  Loan_Status as Y=1,N=0
print ('Before Coding')
data['Loan_Status'].value_counts()
data['Loan_Staus_Present'] = coding(data['Loan_Status'],{'N':0,'Y':1})
print ('\n after coding:')
print (data['Loan_Status_Present'].value_counts())

Hope this helps.
P.S. : Take a look at LabelEncoder in sklearn.preprocessing.

Regards,
Danish