How do we decide the number of clusters to use while implementing the k-means clustering algorithm?




While implementing k-means clustering algorithm in a model, how should we decide the number of clusters that we want to use in the model?
I have read that we need to specify the right number of clusters else an incorrect choice of the number of clusters will invalidate the whole process.
Do we just try out different number of clusters until we find the best number or is there a way that we should follow to get the desired number of clusters for the problem?




For determining number of clusters you can use the elbow method, which is plotting the variance explained with regards to number of clusters. Why it’s called elbow curve? Because the plot looks like an elbow , and where ever the change in variance is maximum we take that number as the ideal number of clusters.