Why create Dummy variables?



I see that people sometimes use indicator or dummy variables when they work on predictive modeling. Is it is necessary to use them, and if so, when?


Hi @Eddie84,

Most machine learning algorithms cannot deal with categorical variables. If have a variable ‘Gender’ that has categories male and female, then you would need to convert this variable into numeric values for the model to understand. One of the ways for doing this is by creating dummy variables (also called one-hot encoding).

Here is an article that will help you understand the concept;