# Why the significance of variable is changed due to the change in number of variable?

#1

I am currently solving one regression problem using linear regression in which I have a created a regression model in R.

model1=lm(Price ~ Year+WinterRain+AGST+HarvestRain+Age+FrancePop,data=wine)
summary(model1)

Call:
lm(formula = Price ~ Year + WinterRain + AGST + HarvestRain + Age + FrancePop, data = wine)

Residuals:
Min       1Q   Median       3Q      Max
-0.48179 -0.24662 -0.00726  0.22012  0.51987

Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept)  7.092e-01  1.467e+02   0.005 0.996194
Year        -5.847e-04  7.900e-02  -0.007 0.994172
WinterRain   1.043e-03  5.310e-04   1.963 0.064416 .
AGST         6.012e-01  1.030e-01   5.836 1.27e-05 ***
HarvestRain -3.958e-03  8.751e-04  -4.523 0.000233 ***
Age                 NA         NA      NA       NA
FrancePop   -4.953e-05  1.667e-04  -0.297 0.769578
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3019 on 19 degrees of freedom
Multiple R-squared:  0.8294,    Adjusted R-squared:  0.7845
F-statistic: 18.47 on 5 and 19 DF,  p-value: 1.044e-06

I have looked into the significances of the model variable by looking into a number of stars which tells about the significances of the variable .So I have removed one variable french population so I get a better model.

##creating new model
model1=lm(Price ~ Year+WinterRain+AGST+HarvestRain+Age,data=wine)
summary(model1)

Call:
lm(formula = Price ~ Year + WinterRain + AGST + HarvestRain +  Age, data = wine)

Residuals:
Min       1Q   Median       3Q      Max
-0.45470 -0.24273  0.00752  0.19773  0.53637

Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.0248601 16.4434570   2.677 0.014477 *
Year        -0.0239308  0.0080969  -2.956 0.007819 **
WinterRain   0.0010755  0.0005073   2.120 0.046694 *
AGST         0.6072093  0.0987022   6.152  5.2e-06 ***
HarvestRain -0.0039715  0.0008538  -4.652 0.000154 ***
Age                 NA         NA      NA       NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.295 on 20 degrees of freedom
Multiple R-squared:  0.8286,    Adjusted R-squared:  0.7943
F-statistic: 24.17 on 4 and 20 DF,  p-value: 2.036e-07

After creating a new model, the variable significances has changed from the previous model .I want to know why this happen.

#2

hello @hinduja1234,

One reason I can think of in this model is that though the variable is insignificant in presence of the other variables it has high correlation with the other predictors or that it explains the other predictors by some amount(multi-collinearity).
You can do a regression of france_pop ~ other explanatory variables to see that.

You can refer to http://www.jerrydallal.com/lhsp/regcoef.htm for a more detailed understanding

Hope this helps!!

#3

I want the dataset of wine price prediction

#4

I think you’ll find the dataset here . Links for some other similar datasets (and tutorials) can be found in this blog