Why do the variables have to be standardised prior to applying PCA




Before applying PCA on the dataset,we have to standardize them.Is it because,when we multiply with an identity matrix,we need to have similar values in X(dataset)??
I am not quite clear about this concept so kindly help me here!!

Why it is necessary to standardize the data in PCA


Rightly said that in PCA, we should standardized the variables before applying PCA because it will give more emphasis to those variables having higher variances than to those variables with very low variances while identifying the right principle component.

Let’s say your data set has variables with different unit like one is in KM and another one in CM (centimeter) but both have same change in value so here for variable in KM will reflect minor change where as another one will have higher change. In this case, if we do not standardize the variable PCA will give higher preference to centimeter variable.

In another scenario, if variables of data-set have same units of measurement and values may lie in the range of 70-130 for one variable but for other one in between 2-8 for all the records. Here PCA will give more weight to the first variable. Here standardization is required to tackle these issues.



Nice explanation