Clustering Ordinal/Categorical data




I am working on a dataset to identify clusters among people based on their ratings on Likert scale(1-5) i.e Strongly disagree - Strongly agree, consist of 1000 observations and 19 features, all measured on the same scale. I am trying to find answers to the following questions:

a) Is normalization necessary/mandatory before measuring dissimilarity?

b) What is the similarity/dissimilarity metric to be applied here to perform hierarchical clustering? viz - euclidean, manhattan, gower…etc what is correlation based distance measure means?

c) Is kmeans function in R able to cluster with default metrics? if not then what is the alternative?

d) What is the best way to perform the same in R?


Clustering in this case do not require normalization as the scales of data is same for all variables.
It is only required when there are different scales e.g age, income- there is huge difference in scales of these two features & can result in calculating wrong euclidean distances & hence wrong clusters.

I think you should perform conjoint analysis to figure out most preferable features by users & use highly preferred features for clustering.
You can also use PCA or t-SNE for dimension reduction & hence do clustering


@ Dhillon, I am in agreement with not scaling the features. If I continue with clustering analysis without PCA, what distance metric I should select - euclidean, manhattan or gower for heirarchial clustering,
Also , which function I should choose kmeans() or kmodes() or any other. I understand Kmeans() is not the ideal choice as it doesn’t support ordinal data.