Should I scale ordinal data to use in analysis


#1

Hi,

I have data that includes some variables which are nominal (like gender-Male, female,other) and few other ordinal data variables whose values are from 1 to 5 (1-Very bad…5-Outstanding). There are some continuous variables such as salary.
Included dummy variables for nominal data and scaled the continuous variables using scale() function in R.

Should I scale the ordinal data variables as well using the scale() method or should they be included as is.

I want to create a model using classification techniques with this data and my target is categorical variable.

Please advise.

Thanks


#2

@kpksr,

I think scaling is usually a nice idea but your results might not be greatly impacted, even when you don’t scale ordinal variables. You can try it out yourself.

Regards,
Aayush


#3

Genuine question: What does scaling even mean in the context of ordinal variables?


#4

@anon: Converting High, medium, low to 3,2,1 and then doing a min max scaling or some other normalization method.


#5

Hi @kpksr
If you scale the continuous variables they could have a lower value than your nominal which go to 5. Based on the algorithm you use unscaled nominal will have higher weight as their variance in this case will be higher.
Hope this help
Alain


#6

Hi @kpksr

Great question and great answers in this thread. Besides your main goal, if at some point you need to do clustering to improve the classification accuracy, keep in mind the daisy() function from the cluster package to find distances using also nominal/ordinal data.

https://stat.ethz.ch/R-manual/R-devel/library/cluster/html/daisy.html

Regards,

Fernando


#7

Since it is a classification problem and if we use tree based models like decision tree ,random forest we don’t need to scale the variables. In the case of xgboost we also generally don’t do scaling. u can compare the validation score of models with and without using scaling.