Identify column values that are in train but missing in test



I have data of the following structure

I want to check how many values of the column “Item_Identifier” are missing in the test data.

I have already got a list of unique “Item_Identifier” from the test by

item_ids_in_test = test.Item_Identifier.unique()

Surely I can run loops and check but the dataset is large , it will take time and I am looking for a better way , if pandas provide it?



you can use set function to get the values of a series in train that are not in test. Use the following script:


below is a screen capture of a simulation:


Hi @mohdsanadzakirizvi

You can also try using:

np.setdiff1d(train['Item_Identifier'], test['Item_Identifier'])

This will return an array containing the values that are in train but not in test.

Hope this helps.