How to create dummy variable for n-1 categories in python?

dummy_variable
python

#1

The code is as following:

I want dummy variable for Quarters i.e Q1, Q2, Q3, Q4

as per statistics the dummy variable should be of 3 categories Q2, Q3, Q4 taking Q1 as a base category

My code which gives in 4 categories

cat_column=[‘Quarter’]
cat_data=train[‘Quarter’].tolist()
data_=train[cat_column]
le = preprocessing.LabelEncoder()
le.fit(cat_data)
newcol=le.transform(data_)
enc = preprocessing.OneHotEncoder(sparse=False)
enc.fit(newcol)
print(enc.n_values_)
print(enc.feature_indices_)
dummy_var4=enc.transform(newcol)

output:
[4]
[0 4]

array([[ 1., 0., 0., 0.],
[ 1., 0., 0., 0.],
[ 1., 0., 0., 0.],
[ 1., 0., 0., 0.],
[ 1., 0., 0., 0.],
[ 1., 0., 0., 0.],
[ 1., 0., 0., 0.]])


#2

Hi @Swapnil_Sharma,

I don’t know if any function can do this, but you can create dummy variables yourself.

  • How? You create 1/0 for Q2, Q3 and Q4. Using a simple if else statements.
  • How this is taking Q1 as base? Well 0,0,0 in new Q2,Q3,Q4 variable will represent Q1

Hope this helps.

Regards,
Aayush


#3

I have found the solution.
There is a function pd.get_dummies() that will do the thing.

Sorry for the late reply.


#4

this is another technic, to create dummies from discrete variables in python: http://python-apuntes.blogspot.com/2017/04/creacion-de-variables-de-grupo.html