I am currently learning ML algorithms and implementing in R. I have a couple basic questions.
1.)Is dimensionality reduction same as feature selection?
I know that in R specifying importance=T parameter in randomForest function gives you the important features based on info.gain.I was reading a bit upon PCA and came to know that it’s an dimensionality reduction technique which transforms your feature space to new dimensions. How does PCA calculate the attributes importance.How to get the subset of important features using PCA in R?
2.)I know that linear regression an statistical model where prior to building model it should satisfy some of assumptions(Hypothesis) like
All the attributes in the dataset must be IID.
Residuals must be normally distributed.
Homoscedasticity among attributes.
How do i check if attributes satisfy these assumptions prior to building model.
Does doing cor() on attributes and removing the attributes with higher correlation assure my attributes are Homescedastic.
Regarding I.I.D do i need to do t.test() or chi.square among all the attributes?
3.)How to check the correlation coefficients for categorical/multi-class attributes? Usually in R the input for cor() is subset of numerical attributes from the dataset. What to do with categoricals?
I Understanding may be wrong in many ways please correct me.
Sorry, if this is an naive question.