Imputing Missing Values

missing_values

#1

How to fill missing values in Credit History using Loan Status column.

Such as missing value of Credit History againts ‘1’ in Loan Status should be filled as 1 and missing value of Credit History againts ‘0’ in Loan Status should be filled as 0.

I am stuck.
Help cannot move forward.
.


#2

Hello @ASHISH_17

You can try this:

df.loc[ (pd.isnull(df['Credit History'])) & (df['Loan Status'] == 1), 'Credit History'] = 1

And similarly,

df.loc[ (pd.isnull(df['Credit History'])) & (df['Loan Status'] == 0), 'Credit History'] = 0

Hope this helps.

Shubham`


#3

Hey @ASHISH_17,

This should work :

Encode ‘Y’ as 1 and ‘N’ as 0

df[‘Loan_Status’] = df[‘Loan_Status’].map(lambda x: 1 if x==‘Y’ else 0)

Fill Credit_History with corresponding Loan Status values

df[‘Credit_History’].fillna(df[‘Loan_Status’], inplace = True)


#4

Hey @pavscorp1911

Thanks it worked.
Can you help me with step as well -

I want to fill missing values of Loan Amount using Education and Self Employed column such filling of missing values as mean value of Loan Amount using both the columns .


#5

Hey @ASHISH_17,
Assuming that there are no missing values in either of Education or Self Employed, try this :

Compute the mean based on Education and Self Employed

table = train_data.pivot_table(index = [‘Education’,‘Self_Employed’], values = ‘LoanAmount’, aggfunc=np.mean)

Function to fill in the missing values

def fill(x):
    if pd.isnull(x['LoanAmount']):
        return table[x['Education']][x['Self_Employed']]
    else:
        return x['LoanAmount']

Final allocation

train_data[‘LoanAmount’] = train_data.apply(lambda x : format(fill(x),’.2f’),axis=1)


Filling missing values
#6

Thanks for this info.
Helped me a lot.

I had to change it a lil bit by adding table.loc and removing ‘:.2f’


#7

Mode = function(x){
ta = table(x)
tam = max(ta)
if (all(ta == tam))
mod = NA
else
if(is.numeric(x))
mod = as.numeric(names(ta)[ta == tam])
else
mod = names(ta)[ta == tam]
return(mod)
}

library(Hmisc)
library(rpart)
library(randomForest)
library(neuralnet)
library(nnet)
library(e1071)
library(modeest)

########################## TRAIN ###########################
train_loan$Married[train_loan$Married==""]<-Mode(train_loan$Married)
train_loan$Loan_Amount_Term[is.na(train_loan$Loan_Amount_Term)]<-Mode(train_loan$Loan_Amount_Term)
train_loan$Dependents[train_loan$Dependents==""]<-Mode(train_loan$Dependents)
train_loan$LoanAmount[is.na(train_loan$LoanAmount)]<-Mode(train_loan$LoanAmount)
train_loan$Self_Employed[train_loan$Self_Employed==""]<-Mode(train_loan$Self_Employed)
train_loan$Credit_History[is.na(train_loan$Credit_History)]<-Mode(train_loan$Credit_History)

train_loan$CoapplicantIncome<-as.integer(train_loan$CoapplicantIncome)
train_loan$Loan_Amount_Term<-as.factor(train_loan$Loan_Amount_Term)
train_loan$LoanAmount<-as.integer(train_loan$LoanAmount)

########################### TEST ###################################
test_loan$Married[test_loan$Married==""]<-Mode(test_loan$Married)
test_loan$Loan_Amount_Term[is.na(test_loan$Loan_Amount_Term)]<-Mode(test_loan$Loan_Amount_Term)
test_loan$Dependents[test_loan$Dependents==""]<-Mode(test_loan$Dependents)
test_loan$LoanAmount[is.na(test_loan$LoanAmount)]<-Mode(test_loan$LoanAmount)
test_loan$Self_Employed[test_loan$Self_Employed==""]<-Mode(test_loan$Self_Employed)
test_loan$Credit_History[is.na(test_loan$Credit_History)]<-Mode(test_loan$Credit_History)

test_loan$Credit_History<-as.factor(test_loan$Credit_History)
test_loan$CoapplicantIncome<-as.integer(test_loan$CoapplicantIncome)
test_loan$Loan_Amount_Term<-as.factor(test_loan$Loan_Amount_Term)
test_loan$LoanAmount<-as.integer(test_loan$LoanAmount)


#8

can u plz explain the last step…final allocation
— format(fill(x)),’.2f


#9

Hey @ishoo_bhardwaj,
This is just to get the value till two decimal places.


#10

Hey @pavscorp1911
I did fill the missing values using that code and making certain changes in it. But now the dtype is object instead of float.
I have tried using astype but failed.

Is there any reason why it changed to object dtype instead of float ?


#11

Thnkew for ur hlp


#12

Hello everyone,

I just registered for Loan Prediction competition. I am struck with one problem.

I am able to fill the missing values which are of integer types. But the problem is with the Categorical variables.

For example the variable Gender consists of - “Male” and “Female” and some missing values. But it is even considering as SPACE as one type.

Please help me how to change this?


#13

Change Male or Female to 1 and the other as 0.
Then fill the missing values as per your wish.

train[‘Variable_name’].map({“Male”:1,“Female”:0})


#14

I guess you should checkout Imputer once…
A built-in function to do such kind of task filling…
Also Label encoder…


#16

I am getting error while executing Final allocation
train_data[‘LoanAmount’] = train_data.apply(lambda x : format(fill(x),’.2f’),axis=1).

error KeyError: (‘Graduate’, ‘occurred at index 0’)
Please suggest me solution of this.


#17

I am also getting the same error, what should I do


#18

How should I fill Self Employed Missing values


#19

hey
can u plz provide dataset about loan prediction…or suggest where i can get???


#21

error in final allocation…
df[‘LoanAmount’] = df.apply(lambda x : format(fill(x),’.2f’),axis=1
^
SyntaxError: invalid character in identifier


#22

Hi @rks_ml,

This works for me, please check again and let me know

df['LoanAmount'] = df.apply(lambda x : format(fill(x),'.2f'),axis=1)