How to identify variables to include or exclude from linear regression model?



I am using linear regression model to predict target variable based on sample observations with details of input variables. Now I want to know the statistical measures those guide me to include specific variable with significance level to include or exclude from the model.



Hi Steve,

You can remove variables based on their significance(p < 0.5). After that you can run assumption testing like removing highly colinear variables, removing multicolinearity, hetroscadacity test, residual test etc.

Hope this helps.

Aayush Agrawal


Hi Steve

Let me take an example to explain this questions clearly. This is a basic question which generally confuses many people solving problem in analytics:-

This is the output data. I have used IBM SPSS to solve this question.

Dependent Variable is P/E Ratio
Independent Variables are is Dividend Payout Ratio, Debt/Equity Position, Firms ROE/Industry ROE, growth rate in sales

To find out which of the variables are significant and should be kept in the model, can be inferred by looking at two statistics i.e. Significant Value(p value) & t statistic as shown above.

A variable is said to be significant if:-
p value < 0.05 ( at 95% confidence level)
t statistic > 2 (irrespective of the sign)

Hence, from the given output we can infer that Debt/Equity Position, Firms ROE/Industry ROE, Growth rate in sales have a significant impact on the P/E ratio of the company.

Also, Standardized Coefficients Beta shows the impact of independent variable on dependent variable. Higher the number, higher is the impact.

The statistics defined above will surely help you in choosing the appropriate variables for the model.

Hope this helps!