First and second transaction dates using R from transaction table


#1

Hello Team,
I have a transaction table which has multiple transactions with customer IDs, Now for each customer i want to find out first order date and second order date using R. Note : For each customer there are so many transactions, but i want to extract only first order date and second order date.


#2

Use dplyr package


#3

Hello! Use dplyr package
diff_MAX <- as.data.frame(diff %>% group_by(ID) %>% top_n(1, date)) - for Max date per client
diff_MAX <- as.data.frame(diff %>% group_by(ID) %>% top_n(2, date)) - for Max 2 date per client

Good Luck!


#4

My method is abit weird and taking time but I am pretty sure it will works

using “plyr” package to create cluster all transaction belonging to one customer ID into 1 row like this
table <- ddply(table, c(“customerId”), function(df1)paste(df1$transactionId, collapse=","))
Then save file “table” to csv. format.
If you wanna read it in R then import csv file into R again.
Note: detach package “dplyr” if using before load “plyr”


#5

Hello! Here is a solution with data.table:

library(data.table)
transactions <- data.table(transactions)

setkey(transaction, order_date) # sort table by date
transactions[transactions[, head(.I,2), by = customer_id]$V1]

Explanation:

transactions[, head(.I,2), by = customer_id]

Gives you a table with customer_id and indices (stored in ‘V1’ column) of first 2 transactions for every customer in a transaction table.
Then you subset your table with these indices.

For 400k table ~ 7 seconds.


#6

Thank you Guys