# How to combine 400 shops in smaller number of groups?

#1

Hi,
I have a data set for a set of customers and where they shop at. The dataset has 10Million customers and 400 shops where they shop at. I would like to create 5-7 groups of shops out of these 400 shops based on customers frequency of purchase. What technique should I use?

#2

This seems as clustering issue, first define your metrics to build a meaningful distance.

Alain

#3

Let us create a random dataset

data <- data.frame(cust=sample(seq(1,1000),10000,replace=TRUE),shop=sample(seq(1,40),10000,replace=TRUE),buyTime=sample(seq(c(ISOdate(2016,3,1)), by = “day”, length.out = 100),10000,replace=TRUE))

There are 3 columns
CUST : Customer ID
SHOP : Shop Number
BUYTIME : 2016 March 1 to 2016 Jun 10

There are so many things that can be used as mentioned by @Lesaffrea

Number of visits to the shops

``````data\$count <- 1
modData <- aggregate(data[,c(2,4)],by=list(data\$shop),FUN=sum)
kmeans(modData[,c(2,3)],5)

K-means clustering with 5 clusters of sizes 8, 6, 7, 10, 9

Cluster means:
shop    count
1 9210.500 253.5000
2 2691.667 257.6667
3  977.000 248.8571
4 6801.700 247.7000
5 4393.444 245.2222

Clustering vector:
[1] 3 3 3 3 3 3 3 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 1 4 1 1 1 1 1 1 1

Within cluster sum of squares by cluster:
[1] 2963100.0  782712.7 1536266.9 4023194.2 2686967.8
(between_SS / total_SS =  96.4 %)
``````

You can think of other metrics ( maybe something using the time variable )

Regards,
Anant