# How to find the number of cluster in K-means algorithm?

#1

I am currently studying about clustering while studying it I came across two major clustering algorithm.

1- K-means clustering
2-Hierachical clustering

In Hierarchical clustering, we can use dendrograms for selecting the number of the cluster in the data, we can not use dendrograms for selecting a number of the cluster in K-means algorithm. I want to know how we can select number cluster in K-means algorithm .

#2

Hi Sid,

Regards,
Aayush

#3

hi @sid100158,

1.If you have all numeric data then you can use:

``````#Draw a scree plot:
wss = lapply(1:15,function(x)kmeans(auto_kmean,centers = x,nstart = 30)\$tot.withinss)
plot(1:15,wss,type = "l",xlab = "# of clusters",ylab = "Total Within SS")
``````

Which gives,

From the plot you can see that the optimal number might be 3/4.
In case your data has a mix of numeric and categorical and has more than 2000 records you can use the pam function in ‘fpc’.

``````library(fpc)
pamk.best <- pamk(insurance_dummies,krange = 1:8,usepam = F)
cat("number of clusters estimated by optimum average silhouette width:", pamk.best\$nc, "\n")
``````

Hope this helps!!