How to handle data with unique interaction id but with multiple events


#1

Hi All,

I have a dataset which has an interaction_id column and events column.

| interaction_id | events         |
| -------------- | -------------- |
| i123           | enquiry        |
| i123           | sell           |
| i123           | update details |
| i978           | view options   |
| i999           | sell           |
| i999           | enquiry        |

As you can see, a unique interaction_id can have multiple events that belongs to it. The problem that I am trying to solve (or to validate if possible or not to solve) is to structure the above data so it can answer these:

  • when there is an sell event, what other events are there?
  • how many interactions have both sell and enquiry events?

At the moment, the solution I have thought about is adding tagging columns for each interaction_id but this solution can only answer the 2nd question above.

Do you have any suggestion on how to tackle this problem? Or is it simply unable to be solved using data structure?

Thanks!


#2

Hi @cruisybd,

You can structure the data in a better way. Let the rows be an individual interaction_id and columns be the events. So it should look something like this :

 ID   inquiry    sell   update_details    view_options
i123     1         1          1               0 
i1978    0         0          0               1
i999     1         1          0               0

This should work if you have less events. If not that, below is the approach you can follow

  1. For each interaction id, check is it has a sell event.
  2. If no, move to the next interaction id. Else, print all other events for the id.
  1. Let the initial count be 0.
  2. For each interaction id, check if both sell and enquiry events are present.
  3. If yes, increase the count by 1. Else move to the next interaction id.

#3

Hi @AishwaryaSingh,

Thank you for your input.

This is the approach that I have thought about but the problem is with this structure, I am not able to answer the second question. (I should have given more context on the second one).

This will work if I produce the output line by line, but what I am trying to achieve is, the count of interaction id for each event. So for example, the answer would be:

# i123 and i999 have a sell event and the other events that occurred were
inquiry 2
update details 1

It is double counting a unique interaction id but for now, that’s what I am trying to achieve.

Thanks


#4
  1. Let us suppose you have count_inq = 0 and count_details=0 .
  2. Check for each interation_id, if the id has a sell event.
  3. If yes, check inquiry event , increase count_inq, check count_details, and so on for all the events.
  4. If the id has no sell event, simply move to the next id

(implementing this will require a good amount of coding)


#5

Hi @AishwaryaSingh,

Thanks for the suggested solution. Based on the approach, I think then there’s not a clean simple solution where I can just use simple SELECT, FILTER, GROUP_BY - sql like functions by somehow modifying the data structure.

Regards,
Billy