Linear Regression on categorial and continuous data


#1

I have a doubt - How can we apply linear regression model on dataset having both continuous and categorical variables.
Please help


#2

hi @Apdxt

Linear model only accept numerical values, so you need to encode categorical variables.

Encoding can be done in different ways like label encoding or dummy encoding.
For more details, you can read this blog to learn how to deal with categorical variables. https://www.analyticsvidhya.com/blog/2015/11/easy-methods-deal-categorical-variables-predictive-modeling/

Cheers!
Shubham


#3

Hi Apdxt,

To give you a clear understanding on how it works, Please find below my explanation on the same
Just some semantics and to be clear:

dependent variable == outcome == "y

" in regression formulas such as y=β0+β1x1+β2x2+…+βkxk
independent variable == predictor == one of “xk
” in regression formulas such as y=β0+β1x1+β2x2+…+βkxk

So in most situations the type of regression is dependend on the type of dependent, outcome or "y

" variable. For example, linear regression is used when the dependent variable is continuous, logistic regression when the dependent is categorical with 2 categories, and multinominal regression when the dependent is categorical with more than 2 categories. The predictors can be anything (nominal or ordinal categorical, or continuous, or a mix).

However, do note that most software requires you to recode categorical predictors to a binary numeric system. This just means coding sex to 0 for females and 1 for males or vice versa. For categorical variables with more than 2 levels you’ll need to recode these into L−1
dummy variables where L is the number of levels and these dummies contain a 0 or 1 when they are in the corresponding category. This way each individual (sample) should be represented by having a 1 for the dummy variable he/she is part of and a 0 for the others, or a 0 for all dummies when he/she is part of the reference group.