I am doing clustering on some data and have some questions related to it.For demonstration purpose I will use the auto dataset of R:
auto_kmean <- na.omit(data.frame('HP'=auto$horsepow,'WB'=auto$wheelbas)) kmean_auto <- kmeans(auto_kmean,centers = 3) plot(auto_kmean,pch = 20,col = kmean_auto$cluster) points(kmean_auto$centers,pch = "+",col = c("red","blue"),cex = 2)
This gives me:
I want to know how do I find the optimal number of clusters.For example: I can go to max 5 clusters but 3 might be a better choice than 4 or 5 though using more clusters increases my between_SS/total_SS.Is there any way to find that out in R?I know that business considerations are sometimes used to determine the number of clusters but here I am looking for any technique to do so.
Can somebody please help me with this?