Clustering and R and time component



Looking for some advice here:

I have product, booking_weekly_date, qty, revenue, cost, 2 more numerical variables

Initially I clustered using k means for different products based on qty, revenue, cost, other 2 numerical values by aggregating all the numerical values at the PID level

but if we need to bring the booking_weekly_date then the aggregation of all the numerical columns will be at (PID, bookingDate). what will the cluster convey? will it capture seasonality pattern of the product. Appreciate any help?


Hi @rumsinha.

I’ll be able to assist you better if you can provide me with a few rows of your dataset.


thanks Saurav,

few dummy data as below:
PID, week_start_date, Sales Order Count,Ship Set Count, Revenue, Booking Quantity, Booking Cost
A, 03-jan-2016,5,2,1000,10,5
B, 07-jan-2016,15,10,100,12,51
C, 10-jan-2016,10,5,2000,1,35
D, 17-jan-2016,12,6,5000,2,50
E, 24-jan-2016,4,1,3000,3,51

can you please help as to which algorithm I can use to get the best clusters… approx records 20000+

2 years data for 1000 PIDs, booking date weekly but not necessary that one PID will be booked every week.


Ok @rumsinha.

So don’t add date as a dimension for clustering straight away. Rather extract date, month and year from the date and use them in clustering to capture seasonality.

Hope it helps. :slight_smile:


Thanks…Saurav, so with date, month and year
should I use pam clustering?


Saurav, if one PID has this kind of booking date then how can I do clustering including the booking date…
A, 03-jan-2016,5,2,1000,10,5
A, 07-jan-2016,15,10,100,12,51
A, 10-jan-2016,10,5,2000,1,35
A, 17-jan-2016,12,6,5000,2,50
A, 24-jan-2016,4,1,3000,3,51,

so I extract what information from booking date and how do I proceed with clustering.

without date, I did kmeans but when weekly booking date comes into picture then how should I proceed?



Use as.numeric(format(date1, “%m”)) to extract month and similarly extract year and date and use them as features for clustering by dropping the date column.


Thanks Saurav