Using linear regression for a classification problem



I was working on a data set that has different attributes related to Banking

Here the dependent variable(RESP) is categorical, however most of the other factors are of numeric type. Hence I wanted to apply linear regression for it instead of logistic regression, since most of the independent variables are of numeric type. Can we use linear regression for a classification problem? If yes, how do we convert the answers into 0 or 1? And isn’t it theoretically absurd?

About using Linear Regression with different types

@Corporate_Cowboy - Please look at this post

Hope this helps!



Applying linear regression for classification is not an absurd idea but logistic regression or other classification methods are preferred over linear regression.
You can apply linear regression for classification by assigning a threshold, given below is an example from an online course by Andrew NG where he fitted a line to the data set and used .5 as threshold for classification.

Although you should try to look at other classification techniques.
Hope this helps !!
Syed Danish


Hy @syed.danish

This is a very special case and I am sure Andrew mentioned it. Check the residual of most case and you will out why linear regression is not adequate for classification.



Thank you all for the help.

The discussion from that link was really helpful @hinduja1234. There are some articles given over there which proved to be really useful in understanding the concept. @syed.danish I saw that very video and was able to understand the concept.

I think that it is safe to say that when all the assumptions of linear regression are met, we tend to find linear regression suitable for a classification problem. Correct me if I am wrong.


When classification is binary as in this case it is always good to prefer logistic reg. Linear reg will do its job but the accuracy will be less compared to log reg.


Thanks @Brijesh.


The decision to select Linear OR Logistic regression is not based on the type of predictor variables., Since your dependent variable is categorical with two classes, you will go with logistic regression.