How to find the number of cluster in K-means algorithm?

k-meansclustering
hierarchical_cluster

#1

I am currently studying about clustering while studying it I came across two major clustering algorithm.

1- K-means clustering
2-Hierachical clustering

In Hierarchical clustering, we can use dendrograms for selecting the number of the cluster in the data, we can not use dendrograms for selecting a number of the cluster in K-means algorithm. I want to know how we can select number cluster in K-means algorithm .


#2

Hi Sid,

Please refer to this thread.

Regards,
Aayush


#3

hi @sid100158,

1.If you have all numeric data then you can use:

#Draw a scree plot:
wss = lapply(1:15,function(x)kmeans(auto_kmean,centers = x,nstart = 30)$tot.withinss)
plot(1:15,wss,type = "l",xlab = "# of clusters",ylab = "Total Within SS") 

Which gives,

From the plot you can see that the optimal number might be 3/4.
In case your data has a mix of numeric and categorical and has more than 2000 records you can use the pam function in ‘fpc’.

library(fpc)
pamk.best <- pamk(insurance_dummies,krange = 1:8,usepam = F)
cat("number of clusters estimated by optimum average silhouette width:", pamk.best$nc, "\n")

Hope this helps!!