Linear Regression in R : Coefficients having NA in summary(model)

#1

Hi Guys,

Need your guidance as i would need clarification for below 3,

Input : data set having 26 features out of which i need to perform lm on price column.
lot <- sample(205,42)
data_train <- data_raw[lot.]
data_test <- data_raw[-lot,]

1. when i issue lm(price~.,data_train) i am getting an error as
“contrast can apply for factor variable having 2 or more level” .
However if i issue
lm(formula = price ~ symboling + normalized.losses + make + fuel.type +
aspiration + num.of.doors + body.style + drive.wheels + engine.location +
wheel.base + length + width + height + curb.weight + engine.type +
num.of.cylinders + engine.size + fuel.system + bore + stroke +
compression.ratio + horsepower + peak.rpm + city.mpg + highway.mpg,
data = data_raw)
no error appearing.

2. when i issue
lm(formula = price ~ symboling + normalized.losses + make + fuel.type +
aspiration + num.of.doors + body.style + drive.wheels + engine.location +
wheel.base + length + width + height + curb.weight + engine.type +
num.of.cylinders + engine.size + fuel.system + bore + stroke +
compression.ratio + horsepower + peak.rpm + city.mpg + highway.mpg,
data = data_raw)
few coefficients are coming as NA. but i have searched for unique values of those NA coefficients but i have more than 5 rows having that coefficients.

3. In summary(model) shows
engine.typeohc 8.250e+02 1.299e+03 0.635 0.526547
engine.typeohcf NA NA NA NA
engine.typeohcv -1.484e+03 1.424e+03 -1.041 0.299548
engine.typerotor 4.636e+03 4.783e+03 0.969 0.334262
num.of.cylindersfive -7.068e+03 2.588e+03 -2.731 0.007167 **
num.of.cylindersfour -4.833e+03 3.114e+03 -1.552 0.123069
num.of.cylinderstwelve 8.230e+02 6.116e+03 0.135 0.893157
num.of.cylinderstwo NA NA NA NA
engine.size 1.011e+02 2.577e+01 3.921 0.000140 ***
fuel.system2bbl 3.375e+03 1.688e+03 2.000 0.047552 *
fuel.system4bbl -1.062e+03 2.779e+03 -0.382 0.702981
fuel.systemidi NA NA NA NA

two is one of values in “num.of.cylinder”… i am confused as summary suggested that cylinder two is not relevant for the model… In this case, how to remove or how do i proceed.

#2

Hi,

The estimate with NA does not add any information to the model. That’s the reason you are getting NA as estimates.

If there is a colinearity between variables A & B, and if the model obtains info from variable A, it will estimate variable as B, since it won’t give any additional info to the model.

Try removing those variables and run.

#3

Hi Prakash,

Thank you.

1 concern is, the summary shows NA for one of the variable. If column does contain 4 unique variables then summary shows NA for col\$value1 and col\$value2 but showing co-efficient values for col\$value3 and col\$value4. In this case we can not remove the column , am i right or wrong here ? because the model uses other two values.

Regards,
Nantha.

#4

I think you can try interactions with the category variable or do dummy coding with different base level value to see whether you are getting estimates…

If not then you may try ignoring these dummy columns for model.

#5

Hi Prakash,

#6

I meant you can try interactions between predictors to see you are getting estimates instead of NA.
like
lm(formula = y ~ f1 * f2)

Here, f1 and f2 are predictors

#7

REMOVE missing values …if done then find out colinearity romove variables having collinearity more than 90