For the Black Friday case, I am assuming that the gender, age, occupation,…, and marital status is the same for each User_ID.
How can I go about verifying this hypothesis?
Here’s what I mean:
How can I verify that the Gender, Age, Occupation, City_Category, Stay_In_Current_City_Years, and Marital_Status are F, 0-17, 10, A, 2 and 0 respectively in every row where the User_ID is 1000001?
You will have to create a for loop that compares the columns for every set of User_ID. Here is a basic approach.
- Take two variables
i has the
User_ID at index 0 and
User_ID at index 1.
j are equal, compare the 4 columns. If same, move to the next index; if not same, print the index value.
j are not equal, move to the next index.
PS: Black Friday dataset has a large number of rows and columns so this iteration will take a lot of time. (unless you have good computational power). If you can optimize the loop, do share your approach.
Thanks for your response, AishwaryaSingh
Right! I’m definitely not going to do it for all the IDs – that’ll take too much time.
I finally figured it out.
Basically, I used the
.nunique function to list the number of unique Gender, Age, Occupation, etc. each User_ID possessed, converted the result to a list and used that for my comparisons. See the code below:
#sum up list elements
sum_of_element = 0
for element in listname:
sum_of_element += element
final_list = 
for num in duplicate:
if num not in final_list:
list_of_IDs = Remove(train.loc[:,'User_ID'].values.tolist())
needed_columns = train.loc[:, 'User_ID':'Marital_Status']
likely_erratic = 
for ID in list_of_IDs:
a = needed_columns.loc[needed_columns.User_ID == ID, :].nunique().values.tolist()
if sum_list(a) != 7: