Imputing Missing Values


#23

thx…this is working…thx for help again


#24

can anyone tell me the best way to impute missing value for gender,married,Dependents,loan_amount_term with code…thx in adv


#25

df[‘LoanAmount’] = np.log(df[‘LoanAmount’])
showing error
‘str’ object has no attribute ‘log’


#26

Hi @rks_ml

Try using this:
df['LoanAmount'] = np.log(df['LoanAmount'])

Also, the problem you are facing is due to the inverted comma that you use.


#27

Hi @rks_ml

Here is my approach to filling the missing values, you can use different assumptions and work accordingly.

  1. Check the correlation of feature with other features. If it is possible to fill the values using the values from other features, try that.
    For example, married status can depend on age.

  2. Is it possible that the user wanted to fill NA or nil , in that case, fill None
    or 0.
    For example, Gender or age cannot be nil but loan_credit_history can be.

  3. Last option, if nothing else works, fill the values with mean, median or mode value, depending on the data.

Refer this article for missing value treatment


#28

hiee aishwarya Singh
showing same error again
df[‘LoanAmount’] = np.log(df[‘LoanAmount’])

AttributeError: ‘str’ object has no attribute ‘log’


#29

thank yu


#30

I want to fill missing values of Loan Amount using Education and Self Employed column such filling of missing values as mean value of Loan Amount using both the colum…
send me code


#31

Check the dtype of your loan amount column.

Apart from that -

  1. It is not advisable to use the same feature for assignment. You will lose the original loan amount column. Create a new one like LoanAmount_log.
  2. Prefer log10 for amounts

#32

I tried these -

df[‘Gender’].fillna(df[‘Gender’].mode()[0],inplace=True)
df[‘Married’].fillna(df[‘Married’].mode()[0],inplace=True)
df[‘Loan_Amount_Term’].fillna(df[‘Loan_Amount_Term’].mode()[0],inplace=True)

For Dependents -
df[‘Dependents’] = df[‘Dependents’].replace({‘3+’:3})
impute_gr2 = df.pivot_table(values=[‘Dependents’],index=[‘Gender’,‘Married’],aggfunc=np.median)
for i,row in df.loc[df[‘Dependents’].isnull(),:].iterrows():
ind = tuple([row[‘Gender’],row[‘Married’]])
df.loc[i,‘Dependents’] = impute_gr2.loc[ind].values[0]