Do ensemble techniques take care of high correlation




I am having a dataset in which all the variables are highly correlated:

I did PCA on this dataset but the results are not very encouraging.The scree plot shows that only 1 component will be enough:

However if I take only two components their loadings are:

As the loadings matrix show the correlations between the variables and PC’s are quite low,applying PCA is also not going to help i guess.I can apply ada to perform additive logistic regression but I am not sure if that would be the right method?
I have to ultimately predict churn but the data is highly skewed:

Decision tree is also not helping as only one node(root) is getting generated.
Can someone please guide me on this one??


PCA -is a mathematical procedure that uses an orthogonal transformation to convert a set of values of possibly correlated.M variables into a set of values of K uncorrelated variables called principal components.
More the variables are correlated less be uncorrelated principal components.So applying PCA would not help in this case.

Ensemble Method-combine many predicators to give the more accurate result of predication.
Some ensemble methods like bagging and boosting .They usually decrease the variance of the base model.

The correlation coefficient is often defined as the ratio of between-subject variation to the total variation. If the between-subject variation is high compared to the within-subject variation, then the correlation coefficient would be high.So it can be say that to decrease the correlation we can apply the ensemble method .

Hope this helps!


hi @hinduja1234,

Thanks for the quick reply.I tried using ada:

As you can see the results are not very promising.How can tweak ada to achieve better accuracy??