Should we perform factorization on binary/catagorical variables while doing regression?

machine_learning

#1

Hi,

It might be a generic question. I am trying to perform Logistic regression on a specific data set in R. I have a binary column. Should I factorize it (i.e. as.factor(column_X)) before constructing the model? or Should I leave it as it is.

The reason why I am opting for the second option is because by any means R will perform one hot encoding internally and convert the binary column into numeric again.

Thanks,
Satyadeep


#2

Hi @satyadeep9123,

You can simply encode your binary variable into 0/1 and use it as a numerical variable.


#3

Hi @PulkitS ,

Thanks for the answer. I have already 0/1 as the column values and the datatype of the column is numeric. Should I convert it to factors before constructing the model or I can use it as numeric binary column only?


#4

Hi @satyadeep9123,

There is no need to convert it into factors as there are only two categories which are being represented by 0/1 in your variable. You can convert the variable into factor and then One Hot Encode it only if there are more than 2 categories.


#5

Thanks @PulkitS.