I have got a customer data involving variables like credit limit, DSO days etc… having outliers ,missing values,and zeros.
I want to cluster the data as Low risk ,Medium risk and High risk customers using Kmeans and KNN for prediction…
Found the data to be non normal using descriptive statistics.Treated outliers using quantiles, treated missing values by imputing minimum value, then normalized data using normalization/standardization techniques. Then applied Kmeans clustering which yielded no proper results(overlapping clusters)
- Do we need to normalize the data for clustering?
If needed, how? please suggest the procedure in R.
- What kind of data transformation is useful in this case? How to handle negative values which cannot be avoided?
bottom line: Struggling in preparing the data for K-means.