Imputing Missing Values

missing_values

#23

thx…this is working…thx for help again


#24

can anyone tell me the best way to impute missing value for gender,married,Dependents,loan_amount_term with code…thx in adv


#25

df[‘LoanAmount’] = np.log(df[‘LoanAmount’])
showing error
‘str’ object has no attribute ‘log’


#26

Hi @rks_ml

Try using this:
df['LoanAmount'] = np.log(df['LoanAmount'])

Also, the problem you are facing is due to the inverted comma that you use.


#27

Hi @rks_ml

Here is my approach to filling the missing values, you can use different assumptions and work accordingly.

  1. Check the correlation of feature with other features. If it is possible to fill the values using the values from other features, try that.
    For example, married status can depend on age.

  2. Is it possible that the user wanted to fill NA or nil , in that case, fill None
    or 0.
    For example, Gender or age cannot be nil but loan_credit_history can be.

  3. Last option, if nothing else works, fill the values with mean, median or mode value, depending on the data.

Refer this article for missing value treatment


#28

hiee aishwarya Singh
showing same error again
df[‘LoanAmount’] = np.log(df[‘LoanAmount’])

AttributeError: ‘str’ object has no attribute ‘log’


#29

thank yu


#30

I want to fill missing values of Loan Amount using Education and Self Employed column such filling of missing values as mean value of Loan Amount using both the colum…
send me code


#31

Check the dtype of your loan amount column.

Apart from that -

  1. It is not advisable to use the same feature for assignment. You will lose the original loan amount column. Create a new one like LoanAmount_log.
  2. Prefer log10 for amounts

#32

I tried these -

df[‘Gender’].fillna(df[‘Gender’].mode()[0],inplace=True)
df[‘Married’].fillna(df[‘Married’].mode()[0],inplace=True)
df[‘Loan_Amount_Term’].fillna(df[‘Loan_Amount_Term’].mode()[0],inplace=True)

For Dependents -
df[‘Dependents’] = df[‘Dependents’].replace({‘3+’:3})
impute_gr2 = df.pivot_table(values=[‘Dependents’],index=[‘Gender’,‘Married’],aggfunc=np.median)
for i,row in df.loc[df[‘Dependents’].isnull(),:].iterrows():
ind = tuple([row[‘Gender’],row[‘Married’]])
df.loc[i,‘Dependents’] = impute_gr2.loc[ind].values[0]


#33

how to handle the outliers?..


#34

and please kindly let me know . how to handle outliers??


#35

You can replace the value of outliers with appropriate values. For example, replace values with median of column.


#36

@shivatharun Outliers are treated with one of the following methods (imputation or capping or prediction)

Imputation - replaced with an appropriate measure of central tendency (mean, median, mode)
Capping - usually 95th or 97th percentile depending on the data
Prediction - imputed with NA and predicted as a response variable


#37

On what basis we can fill the null values in gender and in department its given 3+ so how to hande that? plz help.


#38

Ji @utsav31,

Since both Gender and Dependents are categorical variables, you can use Mode of these variables to fill the missing values. This is one of the many approaches that can be used to impute the missing values in categorical variables. You can also try other techniques.


#39

Hi @utsav31

You can either fill it with the mode value or you can look at other variables. For example, create a condition like if the age is less than 25 and status is married, then female; or you can have a condition if applicant income >coapplicant income, then female.

Treat this as a categorical variable. So 1, 2, 3+ are not numbers but categories in this particular variable


#41

Thank you for gender one it really helped.
But in department should i treat whole column as a categorical or a specific row where 3+ is given??


#42

hey thanks for that.


#43

the whole column as categorical