KMeans clustering with both numerical and categorical data in PySpark

k-meansclustering
clustering
pyspark

#1

I need to do KMeans clustering using both numerical and categorical data. There is KModes algorithm for clustering using only categorical data and KPrototypes algorithm for clustering using both numerical and categorical data. Since I need to implement the clustering using PySpark, there is no library available for KModes/KPrototypes in PySpark. A PySpark implementation of KModes can be found here. Is there any library or some way to implement KProtypes in PySpark?