Identify column values that are in train but missing in test

pandas
python

#1

I have data of the following structure

I want to check how many values of the column “Item_Identifier” are missing in the test data.

I have already got a list of unique “Item_Identifier” from the test by

item_ids_in_test = test.Item_Identifier.unique()

Surely I can run loops and check but the dataset is large , it will take time and I am looking for a better way , if pandas provide it?


#2

Hi,

you can use set function to get the values of a series in train that are not in test. Use the following script:

set(train.Item_Identifier).difference(set(test.Item_Identifier))

below is a screen capture of a simulation:


#3

Hi @mohdsanadzakirizvi

You can also try using:

np.setdiff1d(train['Item_Identifier'], test['Item_Identifier'])

This will return an array containing the values that are in train but not in test.

Hope this helps.
Shubham