I have a transaction table which has multiple transactions with customer IDs, Now for each customer i want to find out first order date and second order date using R. Note : For each customer there are so many transactions, but i want to extract only first order date and second order date.
Use dplyr package
Hello! Use dplyr package
diff_MAX <- as.data.frame(diff %>% group_by(ID) %>% top_n(1, date)) - for Max date per client
diff_MAX <- as.data.frame(diff %>% group_by(ID) %>% top_n(2, date)) - for Max 2 date per client
My method is abit weird and taking time but I am pretty sure it will works
using “plyr” package to create cluster all transaction belonging to one customer ID into 1 row like this
table <- ddply(table, c(“customerId”), function(df1)paste(df1$transactionId, collapse=","))
Then save file “table” to csv. format.
If you wanna read it in R then import csv file into R again.
Note: detach package “dplyr” if using before load “plyr”
Hello! Here is a solution with data.table:
library(data.table) transactions <- data.table(transactions) setkey(transaction, order_date) # sort table by date transactions[transactions[, head(.I,2), by = customer_id]$V1]
transactions[, head(.I,2), by = customer_id]
Gives you a table with customer_id and indices (stored in ‘V1’ column) of first 2 transactions for every customer in a transaction table.
Then you subset your table with these indices.
For 400k table ~ 7 seconds.
Thank you Guys