Using Target Variable to perform Principal Component Analysis in R



This article presents the technique to incorporate the predictor variable while performing Principal Component Analysis and how it can significantly reduce the number of Principal Components while still capturing the maximum variance.

You can read the article in detail over here.



Hi everybody,
I want to add a question on this particular topic (PCA). How can we choose the right number of components after performing PCA on a data set in Python?
Thanks in advance.


For choosing right number of Principal Components(PCs) you can see the amount of variance each PC explains :
# as variance decreases importance of that particular PC decreases var= pca.explained_variance_ratio_

or you can check cumulative variance of Principal Components :
# as the plot saturates PC's significance is minimal plt.plot(np.cumsum(np.round(pca.explained_variance_ratio_, decimals=4)*100))

For better understanding you can go through this article on AV :




I don’t know about the python but PCA with regression and choosing components that help in predictive power is a fairly old concept. And with R there’s “pcr” from the “pls” package.