How to determine the optimal number of clusters in R

clustering

#1

Hello,

I am doing clustering on some data and have some questions related to it.For demonstration purpose I will use the auto dataset of R:

auto_kmean <- na.omit(data.frame('HP'=auto$horsepow,'WB'=auto$wheelbas))
kmean_auto <- kmeans(auto_kmean,centers = 3)
plot(auto_kmean,pch = 20,col = kmean_auto$cluster)
points(kmean_auto$centers,pch = "+",col = c("red","blue"),cex = 2)

This gives me:


I want to know how do I find the optimal number of clusters.For example: I can go to max 5 clusters but 3 might be a better choice than 4 or 5 though using more clusters increases my between_SS/total_SS.Is there any way to find that out in R?I know that business considerations are sometimes used to determine the number of clusters but here I am looking for any technique to do so.
Can somebody please help me with this?


#2

Hi @data_hacks,

Refer to this discussion.

Regards,
Aayush