How to handle multi collinearity in data efficiently?

multi_collinearity
data_wrangling

#1

Hi,

I am building a predictive model with 6 continuous independent variables. I’m trying to select the key variables among these variables to build the model. However, there is collinearity between three of the variables.

So, with small changes in data, I get very different results and interpretations. The variables which are significant on one subset of data don’t remain significant on other subsets. This is causing a lot of confusion and I don’t know how to explain this to my stakeholders.

I am using SAS to build this model. Can you suggest me the efficient methods to deal with it?

Thanks,
Imran


#2

You can use Factor analysis on your independent variables(suggested varimax orthogonal rotation) and then use the derived factor scores as predictors.This would eliminate the problem of multi-colinearity and bring down your VIF.


#3

Imran,

Best way to deal with multi collinearity is to identify the reason of multi collinearity and remove it. Multicollinearity occurs because two or more variables are highly related. If you are able to identify variables in your model which are not essential, removing them reduces multi collinearity. Examining the correlations between variables helps to take decision about which variable to drop from the model.

We can look at one of statistical measures called VIF to check for collinearity. Higher the value of VIF, higher is the collinearity.

You can also use other methods like PCA to solve this.

Sunil


#4

Hi Imran,

To remove multi-collinearity , you can remove the variable whose Variable Inflation Factor is greater than 5 and is highest among all the vairables. You can check VIF value in SAS by following code -

proc reg data= ;
model y = col1 col2 … coln / VIF;
run;

This will produce a table with VIF values. Remove only one variable with the highest VIF value and it should be greater than 5, because if it’s less than 5 then your model is not suffering from problem of multi collinearity. After removal of the variable with highest VIF run your model again and check if VIF for all the variables is under 5 if not repeat the above process.

Other solution will be to use Principle Component Analysis, which will automatically convert multi collinear variable into one single variable.

Hope this helps.

Regards,
Aayush