How to handle multi level factor variable in linear regression


#1

I am performing a linear regression in which an Independent Variable is a factor/categorical variable having multiple level.For Example consider StateCode of US.It will have 50 levels.
Now in the Coefficient table output I will have 50 rows each representing as a separate variable, which means 50 coefficients to interpret. and if I have 10 such Cat Variables. It complicates further(50*10=500)

Any thoughts,whats the best way to deal with such scenario or Am i missing something very fundamental.


#2

@sumit,

It can be achieved using multiple methods like:

  • Creating dummy variables for categorical variables
  • Combining levels of categorical variables into broader group and after that create dummy variables
  • Create a decision tree to identify significant levels of categorical variables and after that perform linear regression for each node
  • You can alo go with combination of regression ans ANOVA.

Thanks,
Pravin


#3

Thanks @pravin for the details. I will try with the suggested Options.


#4

Hi Pravin
can u explain this approach:
Create a decision tree to identify significant levels of categorical variables and after that perform linear regression for each node