How to compare rows of data

For the Black Friday case, I am assuming that the gender, age, occupation,…, and marital status is the same for each User_ID.

How can I go about verifying this hypothesis?

Here’s what I mean:
How can I verify that the Gender, Age, Occupation, City_Category, Stay_In_Current_City_Years, and Marital_Status are F, 0-17, 10, A, 2 and 0 respectively in every row where the User_ID is 1000001?

Thank you.


Hi @fehsuccess,

You will have to create a for loop that compares the columns for every set of User_ID. Here is a basic approach.

  1. Take two variables i and j. Suppose i has the User_ID at index 0 and j has User_ID at index 1.
  2. Compare i and j.
  3. When i and j are equal, compare the 4 columns. If same, move to the next index; if not same, print the index value.
  4. When i and j are not equal, move to the next index.

PS: Black Friday dataset has a large number of rows and columns so this iteration will take a lot of time. (unless you have good computational power). If you can optimize the loop, do share your approach.

Thanks for your response, AishwaryaSingh

Right! I’m definitely not going to do it for all the IDs – that’ll take too much time.

I finally figured it out.

Basically, I used the .nunique function to list the number of unique Gender, Age, Occupation, etc. each User_ID possessed, converted the result to a list and used that for my comparisons. See the code below:

#sum up list elements
def sum_list(listname):
    sum_of_element = 0
    for element in listname:
        sum_of_element += element
    return sum_of_element

#remove duplicates
def Remove(duplicate): 
    final_list = [] 
    for num in duplicate: 
        if num not in final_list: 
    return final_list

list_of_IDs = Remove(train.loc[:,'User_ID'].values.tolist())
needed_columns = train.loc[:, 'User_ID':'Marital_Status']

likely_erratic = []

for ID in list_of_IDs:
    a = needed_columns.loc[needed_columns.User_ID == ID, :].nunique().values.tolist()
    if sum_list(a) != 7:

Great approach! :+1:

