How to delete existing level of a categorical variable?

r

#1

I am currently trying to remove all the missing value from the data So that I can use that data to build a classification model.

My current data
    str(h)
    'data.frame':   614 obs. of  13 variables:
     $ Loan_ID          : Factor w/ 614 levels "LP001002","LP001003",..: 1 2 3 4 5 6 7 8 9 10 ...
     $ Gender           : Factor w/ 3 levels "","Female","Male": 3 3 3 3 3 3 3 3 3 3 ...
     $ Married          : Factor w/ 3 levels "","No","Yes": 2 3 3 3 2 3 3 3 3 3 ...
     $ Dependents       : Factor w/ 5 levels "","0","1","2",..: 2 3 2 2 2 4 2 5 4 3 ...
     $ Education        : Factor w/ 2 levels "Graduate","Not Graduate": 1 1 1 2 1 1 2 1 1 1 ...
     $ Self_Employed    : Factor w/ 3 levels "","No","Yes": 2 2 3 2 2 3 2 2 2 2 ...
     $ ApplicantIncome  : int  5849 4583 3000 2583 6000 5417 2333 3036 4006 12841 ...
     $ CoapplicantIncome: num  0 1508 0 2358 0 ...
     $ LoanAmount       : int  NA 128 66 120 141 267 95 158 168 349 ...
     $ Loan_Amount_Term : int  360 360 360 360 360 360 360 360 360 360 ...
     $ Credit_History   : int  1 1 1 1 1 1 1 0 1 1 ...
     $ Property_Area    : Factor w/ 3 levels "Rural","Semiurban",..: 3 1 3 3 3 3 3 2 3 2 ...
     $ Loan_Status      : Factor w/ 2 levels "N","Y": 2 1 2 2 2 2 2 1 2 1 ...

    For example, I have replaced the missing value in Gender variable by all Female value.


     h$Gender[which(h$Gender=='')]<-'Female'
     table(h$Gender)

           Female   Male 
         0    125    489 
     str(h)
    'data.frame':   614 obs. of  13 variables:
     $ Loan_ID          : Factor w/ 614 levels "LP001002","LP001003",..: 1 2 3 4 5 6 7 8 9 10 ...
     $ Gender           : Factor w/ 3 levels "","Female","Male": 3 3 3 3 3 3 3 3 3 3 ...
     $ Married          : Factor w/ 3 levels "","No","Yes": 2 3 3 3 2 3 3 3 3 3 ...
     $ Dependents       : Factor w/ 5 levels "","0","1","2",..: 2 3 2 2 2 4 2 5 4 3 ...
     $ Education        : Factor w/ 2 levels "Graduate","Not Graduate": 1 1 1 2 1 1 2 1 1 1 ...
     $ Self_Employed    : Factor w/ 3 levels "","No","Yes": 2 2 3 2 2 3 2 2 2 2 ...
     $ ApplicantIncome  : int  5849 4583 3000 2583 6000 5417 2333 3036 4006 12841 ...
     $ CoapplicantIncome: num  0 1508 0 2358 0 ...
     $ LoanAmount       : int  NA 128 66 120 141 267 95 158 168 349 ...
     $ Loan_Amount_Term : int  360 360 360 360 360 360 360 360 360 360 ...
     $ Credit_History   : int  1 1 1 1 1 1 1 0 1 1 ...
     $ Property_Area    : Factor w/ 3 levels "Rural","Semiurban",..: 3 1 3 3 3 3 3 2 3 2 ...
     $ Loan_Status      : Factor w/ 2 levels "N","Y": 2 1 2 2 2 2 2 1 2 1 ...

    Stil I am getting the 3 level in Gender variable .I want to know why this happens and how I can reduce this to 2 levels.

#2

@harry - You can this by creating new variable after removing the missing values.

for the variable Gender,

h$Gender<- factor(h$Gender)
this will remove the missing level, then you can get only 2 levels.

Hope this helps!

Regards,
Hinduja


#3

Hi,

You can try the droplevels(h$gender)->h$gender

hope this helps.