Hi, I am following up on an existing thread opened last 2015 with a new question in 2018 that somehow seemed to be left unanswered: Error : cannot rescale a constant/zero column to unit variance. Basically I am trying to convert categorical variables using one hot encoding and then feeding all the predictors (as.numeric) into prcomp() function but got the error: “error in prcomp.default(training_setx[-19], center = T, scale. = T) : cannot rescale a constant/zero column to unit variance”. I know the issue is with columns having constant 0 but that is precisely what one hot encoding does, so how can I overcome this issue to apply pca? I tried scaling the variable after one hot encoding but still received the same error. This is a sample of my script:

My_Data$My_variable = to_categorical(as.numeric(My_Data$My_variable)) #works fine

prcomp(My_Data, center=T, scale.=T) # RETURNS THE ERROR!

Also a question I have, is it correct to apply one hot encoding to categorical predictors in this case or would just suffice turning them in numeric? Thanks very much and hope to be hearing from you…

# One Hot encoding in PCA

hi @grasso starting from the end , just turning variables to numeric i.e replace “bad” to 0 , “good” to 1 , “better” to 2 and handling it , won’t definitely solve your problem because eventually you are feeding your model some ordinal relationship between a variable i.e only categorical and you do not mean 2>1 or 1>0 or something like this.

Now coming to the point of pca not working are the variable dtypes proper as numeric , also there is one more thing called Sparse PCA , I am not totally sure if I understood your issue correctly , but see the below link .

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.SparsePCA.html

All the best

Thank you Palbha. After further reading I came to the conclusion, perhaps inaccurate, that PCA works best/only with numeric predictors rather than categorical. In fact, as you also said, I do not mean with one hot encoding to let the model “think” that 1>0, since I want to differentiate the elements rather than making one greater than the other. Thanks for your time in answering my question