I am working on a kaggle competition which was hosted few years back. It is a classification problem. The dataset consists of around 40 features and I saw on the forum that many people had applied PCA in order to achieve a good score. However I am confused as to when should one apply PCA and how to tune the number of components in PCA and other parameters.Can someone please explain?
Well explained in this link: http://blog.explainmydata.com/2012/07/should-you-apply-pca-to-your-data.html
@aditya1702 - I typically use PCA as one of the feature selection process along with others like ExtraTrees or Recursive Feature Elimination (RFE) etc. And if the cross-validation score is better for PCA as compared to other feature selection methods, then PCA is a good candidate for the problem. The number of components can be decided on the Explained Variance % metric. Ideally we should have higher explained variance % with as minimal components as possible. Hope that helps.
In my sense there is no specific rules when to apply PCA. But you can use PCA In some scenarios:
- When you want to reduce the number of features. If you reduce the number of features you will loose interpretability of features
- When your machine will not be able to handle the data
- When reduced number of features(say only 1% features) explaining maximum variance in data
- Visualizing the data in lower dimensions
When to use PCA?
you don’t need to reduce the dimensions, if you are just predicting or classifying.
However, if you are taking decision on the basis of relationships between variables, you should do a PCA.
The only scenario when you can leave PCA out is when you don’t see multi-collinearity / correlations in your variables.
However, if you do see a strong or even moderate correlation between variables, PCA is highly suggested
e usual purpose of PCA is dimensionality reduction, i.e. describing relationships in your data using fewer dimensions than are actually present.
A component that explains a lot of variance could be a good feature but not necessarily, its not exactly geared towards that purpose.
What is the difference between principal component analysis (PCA) and feature selection in machine learning?
Both methods reduce dimensionality (# of predictors).
PCA combines similar (correlated) attributes and creates new ones. Superior to original attributes.
Feature selection doesn’t combine attributes. Just evaluates their quality, predictive power and select the best set.
Thanks for helping