I am working on a dataset in which I have to predict purchase amount of different customers. I found that all independent variables are categorical. Is there any other way to handle these categorical variables instead of converting it into numerical variables? If not, can someone suggest how to go about it?

# Is it necessary to convert categorical variable into numeric variable in a model fitting?

All models use only numbers so if you have text values then you have to assing numbers to strings and use these numbers.

If you use module sklearn in Python then you can use LabelEncoder to convert text values to numbers.

When you get predictions then you can use it also to convert back numbers to text values.

The problem you are describing is Regression problem in which categorical data shall be converted in numeric format either by binary encoding (True or False to 1 or 0), ordinal encoding data us in some order like coldest, cold, hot, to 0,1,2 and one hot encoding converting possible values in appropriate columns.

One way to handle categorical variables - is to create columns for each category. For example you have vegetarian, non-vegetarian, vegan three categories you can create three columns vegetarian, non-vegetarian, vegan and use true or false to define which category the person belongs to.

Dear sumi,

1)if all values are categorical then try to use one hot ecoding,label encoding,etc convert to numerical,but this will create large dimensionality data in terms of columns,so this is not advisable.because no of column willbe very large.

2) try to eliminate the coloum categorical values by using chi sq test.

3) use pca to eliminate the coloumns of dataset

4) you can use catboost algorithm which works well for categorical values

Hi,

but True or False is a boolean value right? We have to convert it into integers… only then it can be passed into the model? What if we needed to predict something using multiple attributes?