# What should be number of components in PCA

#1

I am working on a problem while solving that I have find the all principal components of problem but I want to know that how I can select which components should select and how many to select for correct predication of all the variable .

``````my.wines <- read.csv("http://steviep42.bitbucket.org/YOUTUBE.DIR/wines.csv", header=TRUE)
pairs(my.wines[,-1])
my.prc <- prcomp(my.wines[,-1], center=TRUE, scale=TRUE)
my.prc
Standard deviations:
[1] 2.182364e+00 1.345416e+00 5.939266e-01 2.727583e-01 1.406343e-16

Rotation:
PC1         PC2          PC3         PC4           PC5
Hedonic -0.39649537 -0.11492093  0.802467762  0.05192087  1.628461e-01
Meat    -0.44544114  0.10904271 -0.281059542 -0.27447822  6.783554e-01
Dessert -0.26456063  0.58542927 -0.096071446  0.76029634 -1.360023e-15
Price    0.41597371  0.31111971  0.007335518 -0.09388916  5.675693e-01
Sugar   -0.04852953  0.72445063  0.216110570 -0.54740702 -3.393711e-01
Alcohol -0.43850226 -0.05545248 -0.465755989 -0.16874049 -2.753171e-01
Acidity -0.45465130 -0.08646543  0.064304541 -0.08350114  1.441867e-02``````

#2

I recommend you to generate cumulative sum of eigenvalues and divide each value by the total sum of eigenvalues. Now plot these values, it will show the fraction of total variance retained vs. number of eigenvalues.It is advisable to take the first k eigen_vectors that capture at least 85% of the total variance.

Hope this helps!

Regards,
Imran

#3

@hinduja1234 you have correctly performed the pca in R after this you should screeplot on the my.prc which gives you which contribute how much which helps you to decide how many components to select and how many to reject.
I have used your code to generate the screeplot only after the creation of my.prc you have to use
screeplot(my.prc)

In this we can see that only two are major contributors

Hope this helps!