About using Linear Regression with different types


In a data, there are 9 columns and first 8 columns give various measured properties of wine, as floating point numbers, and the final column is a factor (categorical variable) representing the perceived quality of the wine.

Is it appropriate to fit a linear regression model to predict the quality of a wine, given its other properties which are numeric (columns 1 to 8)?

logistic regression is the good choice for this but I wanted to know will the linear regression work either by converting factor into numeric (or) it is not a possible thing because the quality of wine is not numeric to predict (or) linear regression can be used anyway( without converting) because linear models fit in any classification problems ?


Hello @ramya_keerthana

Using linear regression for classification problems is never a good choice.
Let me try to explain this by a very common example.

Lets say you have to predict whether the tumor is malignant or not based on the tumor size. Here the green line shown is our regression line.

So, in the above case, we can say that tumor size greater than 0.5 are malignant and rest are not. So if we easily done classification problem with linear regression.

But consider another case.

Here, if we classify that tumor size greater than 0.5 are malignant, that will not work here. We need to change the threshold, like 0.2 or something to make our predictions correct.

But we cannot change the threshold each time when a new sample arrives. Instead, our algorithm should learn it off from the training set data, and then make correct predictions for the data we haven’t seen before.

Hope this will clear your doubt.
Further you can also refer to this discussion Using linear regression for a classification problem