Data Science : Dimensionality Reduction



Hi Friends,

For Numerical Variables in dimensionality reduction ,
a) Is that enough to have only non-co related variables ?
b) what are the reasons to remove highly ( + or - ) co related variables ?

Please explain me… if any example then please provide.

Thank you for your help in understanding,


We remove highly correlated variables because two avoid double counting.

yrs_birth = Years since birth
age = age of the person

Both columns give you the same information and highly correlated.


You can choose a threshold for maximum correlation value you can allow between different features. For example if the correlation between two features is greater than 0.7, you might drop any one of them.


hi @fornanthu

First what do you mean by dimensionality reduction? There are various methods removing highly correlated for example when using linear models is one, not always necessary. PCA is one way to reduce dimensions if you variables are highly corrolated then you will few principal components . t-sne is also one method based on density, SVD is one algebraic method.

All the methods mentioned are easy to find on Wikipedia.

Best regards



All the comments are true and I want to add that we remove the high correlated variables so our model is not biased to those variables and one of them is enough because they represent each other.