Linear Regression in R : Coefficients having NA in summary(model)

linear_regression
r
machine_learning

#1

Hi Guys,

Need your guidance as i would need clarification for below 3,

Input : data set having 26 features out of which i need to perform lm on price column.
data_raw <- read.csv(“file”)
lot <- sample(205,42)
data_train <- data_raw[lot.]
data_test <- data_raw[-lot,]

  1. when i issue lm(price~.,data_train) i am getting an error as
    “contrast can apply for factor variable having 2 or more level” .
    However if i issue
    lm(formula = price ~ symboling + normalized.losses + make + fuel.type +
    aspiration + num.of.doors + body.style + drive.wheels + engine.location +
    wheel.base + length + width + height + curb.weight + engine.type +
    num.of.cylinders + engine.size + fuel.system + bore + stroke +
    compression.ratio + horsepower + peak.rpm + city.mpg + highway.mpg,
    data = data_raw)
    no error appearing.

  2. when i issue
    lm(formula = price ~ symboling + normalized.losses + make + fuel.type +
    aspiration + num.of.doors + body.style + drive.wheels + engine.location +
    wheel.base + length + width + height + curb.weight + engine.type +
    num.of.cylinders + engine.size + fuel.system + bore + stroke +
    compression.ratio + horsepower + peak.rpm + city.mpg + highway.mpg,
    data = data_raw)
    few coefficients are coming as NA. but i have searched for unique values of those NA coefficients but i have more than 5 rows having that coefficients.

  3. In summary(model) shows
    engine.typeohc 8.250e+02 1.299e+03 0.635 0.526547
    engine.typeohcf NA NA NA NA
    engine.typeohcv -1.484e+03 1.424e+03 -1.041 0.299548
    engine.typerotor 4.636e+03 4.783e+03 0.969 0.334262
    num.of.cylindersfive -7.068e+03 2.588e+03 -2.731 0.007167 **
    num.of.cylindersfour -4.833e+03 3.114e+03 -1.552 0.123069
    num.of.cylinderssix -3.492e+03 2.789e+03 -1.252 0.212684
    num.of.cylinderstwelve 8.230e+02 6.116e+03 0.135 0.893157
    num.of.cylinderstwo NA NA NA NA
    engine.size 1.011e+02 2.577e+01 3.921 0.000140 ***
    fuel.system2bbl 3.375e+03 1.688e+03 2.000 0.047552 *
    fuel.system4bbl -1.062e+03 2.779e+03 -0.382 0.702981
    fuel.systemidi NA NA NA NA

two is one of values in “num.of.cylinder”… i am confused as summary suggested that cylinder two is not relevant for the model… In this case, how to remove or how do i proceed.

looking forward your support here.


#2

Hi,

The estimate with NA does not add any information to the model. That’s the reason you are getting NA as estimates.

If there is a colinearity between variables A & B, and if the model obtains info from variable A, it will estimate variable as B, since it won’t give any additional info to the model.

Try removing those variables and run.


#3

Hi Prakash,

Thank you.

1 concern is, the summary shows NA for one of the variable. If column does contain 4 unique variables then summary shows NA for col$value1 and col$value2 but showing co-efficient values for col$value3 and col$value4. In this case we can not remove the column , am i right or wrong here ? because the model uses other two values.

Regards,
Nantha.


#4

I think you can try interactions with the category variable or do dummy coding with different base level value to see whether you are getting estimates…

If not then you may try ignoring these dummy columns for model.


#5

Hi Prakash,
Could not get your point ? would you please explain with little more information


#6

I meant you can try interactions between predictors to see you are getting estimates instead of NA.
like
lm(formula = y ~ f1 * f2)

Here, f1 and f2 are predictors