How to convert from category to numeric

r
factor_analysis
data_wrangling

#1

str(data)
Classes ‘tbl_df’, ‘tbl’ and ‘data.frame’: 227745 obs. of 37 variables:
county : chr "Buskerud" "Buskerud" "Oslo" "Oslo" ... age_group : chr “24-34” “65+” “55-64” “24-34” …
gender : int 2 2 2 2 2 2 2 2 2 2 ... life_phase : chr “Småbarnsfamilie” “SeniorEnslig” “MiddelaldrendeEnslig” “UngdomStudent” …
drivetime : int 30 30 30 30 30 30 30 30 30 30 ... couple_pair : chr “P” “E” “E” “E” …
kids : int 1 0 0 0 1 0 0 0 0 1 ... house_type : chr “Eneboliger” “Rekkehus, kjedehus og andre småhus” “Tomannsboliger” “Store boligbygg (blokk)” …
$ car_type : chr “VAN_SEGMENT” “B_SMAABILER” “D_MELLOMKLASSEN” “U_NONE” …

char_var <- c(“county”,“age_group”,“life_phase”,“couple_pair”,“house_type”,“car_type”)
class(char_var)

I have 6 char variables out of 37.need to convert those char variables into numeric. and include in my dataframe.


#2

Hi @KumarP

You can not convert category into numeric, a categorical has no order you play with distance such as Gower distance if you want difference if you use one model such as linear regression then in R the transformation to the dummy encoding will be done by the function lm() or Glm().

Hope this help.
Alain


#3

I had tried like this but getting error message. (Error in sort.list(y) : ‘x’ must be atomic for 'sort.list’
Have you called ‘sort’ on a list?)

str(NSBdata)
#Converting categorical to numerical variables
##############
cols<-c(“county”,“age_group”,“life_phase”,“couple_pair”,“house_type”,“car_type”)

for(i in cols){
NSBdata[,i]= as.factor(NSBdata[,i])
}
str(NSBdata)


#4

Converted into Factor but while changing into numeric . it is not transform properly


#5

I did like this.
#Converting categorical to numerical variables
##############################################################
DF <- as.data.frame(unclass(NSBdata)) #turn all character columns into factor columns
str(DF)

DF <- as.matrix(as.data.frame(lapply(DF, as.numeric)))
##############################################################


#6

for (col in colnames(DF)){
if (typeof(DF[,col]) == “character”){
new_col = DF[,col]
new_col[is.na(new_col)] = "missing"
DF[col] = as.factor(new_col)
}
}


#7

i think this might help you in your for loop

df1 <- within (df,newcolumnname <- match(df$columnname,unique(df$columnname)))

this code will convert your character variable to numeric while providing unique code for each of the unique character value


#8

But I have 6 columns which have data type Factor. need to convert those specifiec column in numeric. Please share sample code. Because I am building model on xgBoost.


#9

data1<- within (data,country_new <- match(data$county,unique(data$county)))

sample code

you can apply for loop


#10

for (col in colnames(NSBdata)){
if (typeof(NSBdata[,col]) == “character”){
new_col = NSBdata[,col]
#match(NSBdata$car_type,unique(NSBdata$car_type)
within (NSBdata,new_col <- match(NSBdata[col],unique(NSBdata[col])))
}
}

Is it right. I am little bit confuse.