Is it required to remove correlated variables before performing PCA?



Is it required to remove correlated variables before performing PCA, I don’t think it is necessary but it is mentioned as necessary in question10 of "40 Interview Questions asked at Startups in Machine Learning / Data Science"


PCA is used to remove multicollinearity from the data. As far as I know there is no point in removing correlated variables. If there are correlated variables, then PCA replaces them with a principle component which can explain max variance.


yes, if you have several highly correlated variables then choose between them. This leads to differentially weight several eigenvectors
If you have some correlation between variables it is ok, proceed with PCA


Todo lo contrario, para utilizar PCA es un requisito que las variables tengan correlación. Debes de hacer una serie de pruebas que demuestren correlación entre ellas, no solo la correlación de Pearson.


Removing them makes your models less biased.


Could you please explain,Removing collinearity makes model less biased.How? Pls explain in little detail.
Thanks in advance.


Answer: Do know first what is the meaning of correlation and collinearity ? When you know the answer you will understand why?
The two work the same but …!!!

Ok, When you have two columns for Temperature (S / F) , They are highly correlated
so you can remove one of them and make you model faster and not biased to that Those two temperature’s scales.