# How to find the a single categorical variable importance in a set of all independent categorical variables?

#1

Hi,

Does summing up the individual importance of a categorical variable makes sense when there are only categorical variables in my set of independent variables? I was not convinced with the replies in the above query.

Regards
Akshay

#2

Try Boruta

#3

Thanks for the suggestion. Would definitely try Boruta but is there something in python as I have built the model in python. Calculating in R is not a big deal but will take some time. Which one has more alternatives to solve the problem I have?

Regards
Akshay

#4

There are a few ways in which you can try reducing the features to make the problem less complex.

1. You can find out the features which are highly correlated, and drop some of them.
2. If you have an order in the categorical variables, you can go for label encoding instead of creating dummies.
3. Create a new feature combining multiple other features, and drop the previous features.
4. You can use lasso regression and then find feature importance.

Is there any particular dataset youâ€™re working on where you faced this problem? If yes, can you share the same?

#5

Thanks for taking time to respond Aishwarya!

Few more questions:

1. How to find out correlation - any methods?
2. What do you mean by order in categ. variables? As per my understanding, a categorical variable itself has discrete categories. #noob
3. When you talk about combining new features, is it about the levels within a particular categ. variable or different categ. variables altogether?

Regards
Akshay

#6
1. In python, you have a function corr() to find correlation for continuous variables. For categorical variables, you can use chi-square test.
2. Suppose you have a categorical variable for â€śQualityâ€ť and has levels as Excellent, Good, Fair, Average and Poor, you can replace it with numbers like Excellent is 5 , Good is 4, and so on.
3. You can refer the following discussion thread
Modelling technique for categorical predictor and continuous target

Happy Learning!

Regards,
Aishwarya

#7

Hey @AishwaryaSingh,

Unlike the @cachu 's problem, my problem has binary targets. I am currently figuring out correlation between 8 categorical variables split into >50 feature labels. Hope I could drop some features after getting correlation and then combine some.

Anything you would like to add?

Regards
Akshay

#8

Yes you can drop the highly correlated variables, irrespective of the type of target variable.
Your approach seems fine to me.

Happy Learning!

#9

As you mentioned chi-square test, this test failed due to the lack of data leading to zero frequency rows in contingency tables. Tried Fisherâ€™s Exact test but there is constraint of constant sums which is not my case.

1. What else can be tried out to find out highly correlated variables as mentioned in the previous reply?
2. How to find out the correlation parameter value?

Happily learning!