Dimension reduction technique for Categorical variable?


Hi All,

Is there any dimension reduction technique available for categorical variable? My understanding is that PCA cannot be used for categorical variable - Please correct me if I am wrong.

Has anyone used any other method other than the variable importance plots? The problem that i face with variable importance plots is that each time frame of the dataset gives me different importance. I tried K-fold cross validation, but not much big difference.

Karthikeyan P


Hi @karthe1,

We can apply the PCA algorithm on categorical variable once we transform this variable in proper format. for example transform this variable by applying one-hot-encoding before applying PCA.

Are you talking about the different variable importance in different folds?
Or changing parameters are giving different variable importance?



I also have a similar problem. My categorical variables are not ordered. They are something like country, states, … For that reason, I transform them into binary using LabelBinarizer (from sklearn) which ends up creating a sparse matrix. So i wonder if there is tricks to reduce the dimension