One Hot Encoding



While discussing about One Hot Encoding on loan_prediction data set in a recent blog in AV its explained as follows:
Lets take a look at an example from loan_prediction data set. Feature Dependents have 4 possible values 0,1,2 and 3+ which are then encoded without loss of generality to 0,1,2 and 3.

We, then have a weight “W” assigned for this feature in a linear classifier,which will make a decision based on the constraints WDependents + K > 0 or eqivalently WDependents < K.

Let f(w)= W*Dependents

Possible values that can be attained by the equation are 0, W, 2W and 3W. A problem with this equation is that the weight “W” cannot make decision based on four choices. It can reach to a decision in following ways:

  • All leads to the same decision (all of them <K or vice versa)
  • 3:1 division of the levels (Decision boundary at f(w)>2W)
  • 2:2 division of the levels (Decision boundary at f(w)>W)
    Here we can see that we are loosing many different possible decisions such as the case where “0” and “2W” should be given same label and “3W” and “W” are odd one out.

It will be helpful if the above 3 points of reaching a decision and loosing many different decisions are explained elaborately… Thanks



Hi there,

I will try to explain it in relatively very simple terms.

Without one hot encoding or when a categorical variabe is not converted to a factor type, models like logistic regression treat these categories(0,1,2,3) as numbers like we do - 2 is greater than 1 and so on.
So if you wish to create a decision boundary, the only way possible by this method is say- a number x if greater than 2*W then ti should belong to category A.

But what if 3 and 0 should belong to A ? Then this method of treating categories as numbers will not work. But what if we convert each category to a new feature ? The model in this case will learn appropriate weights for each category and if it is inherently present will give equal weights to 3 and 0.

Hope I made things clear.




Thanks for explaining . Can you please elaborate the following:
3:1 division of the levels (Decision boundary at f(w)>2W)
2:2 division of the levels (Decision boundary at f(w)>W)




It is just to say that in values of 0,w,2w and 3w, any value x greater than 2w will belong to one category, otherwise another . So, this would effectively create a decision boundary with 3:1 division and so on.