How to do one hot encoding in R

r
dummy_variable

#1

hello,

In python we can do one hot encoding by:

#One-hot-encoding features
ohe_feats = ['gender', 'signup_method', 'signup_flow', 'language', 'affiliate_channel', 'affiliate_provider', 'first_affiliate_tracked', 'signup_app', 'first_device_type', 'first_browser']
for f in ohe_feats:
    df_all_dummy = pd.get_dummies(df_all[f], prefix=f)
    df_all = df_all.drop([f], axis=1)
    df_all = pd.concat((df_all, df_all_dummy), axis=1)

However I could not find any packages in R to do the same simply.
Can someone help me with the apt library in R to achieve this


#2

hello @pagal_guy,

Something like below code should do it:

#One-hot-encoding features:
library(ade4)
library(data.table)
ohe_feats = c('gender', 'signup_method', 'signup_flow', 'language', 'affiliate_channel', 
             'affiliate_provider', 'first_affiliate_tracked', 'signup_app', 'first_device_type', 'first_browser')
for (f in ohe_feats){
  df_all_dummy = acm.disjonctif(df_all[f])
  df_all[f] = NULL
  df_all = cbind(df_all, df_all_dummy)
}

Hope this helps!!


#3

@shuvayan Can you be more clear on this code. I didnt get this cod what you wrote. Can you help me with a code in R for this

Thanks,
Rohit


#4

hello @Rohit_Nair,

for each categorical variable which is in the list ohe_feats the acm.disjonctif will create dummies.In the next line those categorical variables are dropped from the original data and in the next line all the dummy variables are added to the original data.
Hope this helps!!


#5

ok thanks @ shuvayan :slightly_smiling:


#6

Hi @Rohit_Nair,

from one of the code shared by @Rohan_Rao I learnt the following way of one hot encoding

Using dummies library:
df <- dummy.data.frame(df, names=c(“MyField1”), sep="_")

Note: This splits the original field into number of unique values. The original field is no longer available in data frame.

Example:

Data:

after
df <- dummy.data.frame(df, names=c(“MyField1”), sep="_")

In method shown by @shuvayan, the original field is still available for you . Hope this helps.


#7

@sadashivb Can you help me with this error? I googled it but couldn’t found out any solution to this.

library(dummies)
df <- dummy.data.frame(Clean_data, names=c(“Gender”), sep="_")
Error in sort.list(y) : ‘x’ must be atomic for 'sort.list’
Have you called ‘sort’ on a list?

$ Gender : chr “Male” “Male” “Male” “Male” …


#8

One hot Encoding can be done in R using model.matrix is simple and easy.
Here is an Example:

FactoredVariable = factor(df$Any) 
dumm = as.data.frame(model.matrix(~FactoredVariable)[,-1])
dfWithDummies = cbind(df, dumm) 
str(dfWithDummies)

you can also try looking in to Caret Package it offers various data preprocessing and modeling tools to make our life easy.
Thanks, Hope it helps !


#9

@pagal_guy
I Think This should work for u

df1 <- within (df,newcolumnname <- match(df$columnname,unique(df$columnname)))