One Hot encoding in PCA

Hi, I am following up on an existing thread opened last 2015 with a new question in 2018 that somehow seemed to be left unanswered: Error : cannot rescale a constant/zero column to unit variance. Basically I am trying to convert categorical variables using one hot encoding and then feeding all the predictors (as.numeric) into prcomp() function but got the error: “error in prcomp.default(training_setx[-19], center = T, scale. = T) : cannot rescale a constant/zero column to unit variance”. I know the issue is with columns having constant 0 but that is precisely what one hot encoding does, so how can I overcome this issue to apply pca? I tried scaling the variable after one hot encoding but still received the same error. This is a sample of my script:
My_Data$My_variable = to_categorical(as.numeric(My_Data$My_variable)) #works fine
prcomp(My_Data, center=T, scale.=T) # RETURNS THE ERROR!
Also a question I have, is it correct to apply one hot encoding to categorical predictors in this case or would just suffice turning them in numeric? Thanks very much and hope to be hearing from you…

hi @grasso starting from the end , just turning variables to numeric i.e replace “bad” to 0 , “good” to 1 , “better” to 2 and handling it , won’t definitely solve your problem because eventually you are feeding your model some ordinal relationship between a variable i.e only categorical and you do not mean 2>1 or 1>0 or something like this.
Now coming to the point of pca not working are the variable dtypes proper as numeric , also there is one more thing called Sparse PCA , I am not totally sure if I understood your issue correctly , but see the below link .
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.SparsePCA.html

All the best :slight_smile:

Thank you Palbha. After further reading I came to the conclusion, perhaps inaccurate, that PCA works best/only with numeric predictors rather than categorical. In fact, as you also said, I do not mean with one hot encoding to let the model “think” that 1>0, since I want to differentiate the elements rather than making one greater than the other. Thanks for your time in answering my question

1 Like