How to fill Nan Values?

python2
data_science

#1

Hi!!,i am new to data science…here is my approach on loan predictionhttps://github.com/sch00lb0y/DataPractice/blob/master/LoanPrediction/LoanPredicion.ipynbbut i droped the Nan row to make prediction…is there any way to fill the Nan values …if there is any way i would like to hear the approach from you
Cheers!!
SchoolBoy;-)
edit1:
i updated my rep with imputation but it leads in loss of accuracy


#2

Hi @schoolboy,

There are many ways to impute missing values. These are a few of them

  • Deletion
  • Mean/ Mode/ Median Imputation
  • Prediction Model Imputation
  • KNN Imputation

Refer the missing values treatment block on this blog for a more in-depth understanding.


#3

Thank You for your suggestion…I’ll work on imputation to improve the model


#4

Adding to @jalFaizy answer, in my experience, how you impute values matters very little. Mean/median/mode is usually good enough. For the particular case of training a tree based model, another possibility is to impute a value that is out of range, to enable the learning algorithm to work with that case but still retain the fact that the value is missing.


#5

I would suggest that you can use mean/median/mode for the numerical variables missing value and in the case of the categorical variables you can use the value which has the most occurrence.

Please let me know how you do solve the issue.


#6

I liked different ideas to fill missing values here. I am working with data in Excel sheet and using few of these approaches:

  1. Mean Imputation (for ApplicantIncome and CoapplicantIncome)
  2. Gender= I saw correlation between Gender and ApplicantIncome using OneWay Aova and t-test and found that Female have a mean of 4643 and Male have a mean of approx 5446. Long story short , I used this condition in my Excel sheet to fill the Gender missing values- =IF(ISBLANK(Gender),IF(4643>ApplicantIncome,“Female”,“Male”),Gender)
  3. CreditHistory: I applied same concept as I did for Gender(in this case - i used CreditHistory and Loan_Status). =IF(ISBLANK(Credit_History),IF(Loan_Status=“N”,0,1),Credit_History)
    Not sure whether you can work this way - from Dependent Variable to Independent Variable.

PS: I am new to such concepts, so please forgive my naive-ness.

Thanks in advance!


#7

Hi all,

Are you guys following the above mentioned steps for both NA & Null Values?


#8

Can anybody tell me how to deal with missing categorical value like 13 is blank data in below image