I have categorical data and I’m trying to implement k-modes using the GitHub package available here. I am trying to create clusters in my (large) dataset of say, 5-7 records, each of most similar records. I mean to add a few more restrictions to creating these clusters later. But as of now, because my data is completely categorical, I thought of implementing k-modes.
However, as of now I have no means to select the optimal ‘k’ which would result in maximum silhouette score, ideally. This would be ideal as k-modes works on dissimilarity/similarity measure as a distance. So I would assume that silhouette distance would then measure how close/far the clusters are based on the distance metric defined by this dissimilarity and thus, establish the silhouette score. I’m not able to find a correct implementation of this.