KeyError in the below attached code

pandas
machine_learning
dataframe
python
bigmart

#1

Hi All - I am running the code given in article Approach and Solution to break in Top 20 of Big Mart Sales prediction

I’m doing the above project in python 3. I’m getting this error - KeyError. Can anyone please explain me the solution to the above problem ?


#2

Hi - Can you list down the whole Traceback for the error?


#3

Did this ever get resolved? I too am having this issue while I work through the BigMart example. The code I believe is causing the issue is:

#Impute data and check #missing values before and after imputation to confirm
print (‘Original #missing: %d’% sum(miss_bool))
data.loc[miss_bool,‘Item_Weight’] = data.loc[miss_bool,‘Item_Identifier’].apply(lambda x: item_avg_weight[x])
print (‘Final #missing: %d’% sum(data[‘Item_Weight’].isnull()))


#4
#Determine the average weight per item

item_avg_weight = data.pivot_table(values='Item_Weight', index='Item_Identifier')

​

#Get a boolean variable specifying missing Item_Weight values

miss_bool = data['Item_Weight'].isnull()

​

#Impute data and check missing values before and after imputation to confirm

print ('Orignal #missing: %d'% sum(miss_bool))

data.loc[miss_bool,'Item_Weight'] = data.loc[miss_bool,'Item_Identifier'].apply(lambda x: item_avg_weight[x])

print ('Final #missing: %d'% sum(data['Item_Weight'].isnull()))

Orignal #missing: 2439

The Traceback is as follows

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2441             try:
-> 2442                 return self._engine.get_loc(key)
   2443             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'FDP10'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-32-711d75ca66b3> in <module>()
      7 #Impute data and check missing values before and after imputation to confirm
      8 print ('Orignal #missing: %d'% sum(miss_bool))
----> 9 data.loc[miss_bool,'Item_Weight'] = data.loc[miss_bool,'Item_Identifier'].apply(lambda x: item_avg_weight[x])
     10 print ('Final #missing: %d'% sum(data['Item_Weight'].isnull()))

~\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   2353             else:
   2354                 values = self.asobject
-> 2355                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2356 
   2357         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src\inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-32-711d75ca66b3> in <lambda>(x)
      7 #Impute data and check missing values before and after imputation to confirm
      8 print ('Orignal #missing: %d'% sum(miss_bool))
----> 9 data.loc[miss_bool,'Item_Weight'] = data.loc[miss_bool,'Item_Identifier'].apply(lambda x: item_avg_weight[x])
     10 print ('Final #missing: %d'% sum(data['Item_Weight'].isnull()))

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   1962             return self._getitem_multilevel(key)
   1963         else:
-> 1964             return self._getitem_column(key)
   1965 
   1966     def _getitem_column(self, key):

~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   1969         # get column
   1970         if self.columns.is_unique:
-> 1971             return self._get_item_cache(key)
   1972 
   1973         # duplicate columns & possible reduce dimensionality

~\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1643         res = cache.get(item)
   1644         if res is None:
-> 1645             values = self._data.get(item)
   1646             res = self._box_item_values(item, values)
   1647             cache[item] = res

~\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   3588 
   3589             if not isnull(item):
-> 3590                 loc = self.items.get_loc(item)
   3591             else:
   3592                 indexer = np.arange(len(self.items))[isnull(self.items)]

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2442                 return self._engine.get_loc(key)
   2443             except KeyError:
-> 2444                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2445 
   2446         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'FDP10'

#5

I suggest you to re run cells from start. Assuming that you have dropped ‘FDP10’ column from the data frame could cause this error.

Anyways, I here’s a handy one-line command to fill the missing values. Please do check this out -

data[‘Item_Weight’].fillna(value=data[‘Item_Weight’].mean(), inplace=True)

Hope this helped. Thanks!


#6

I was actually able to solve the issue with the following bit of code:

#Determine the average weight per item
item_avg_weight = data.pivot_table(values=‘Item_Weight’, index=‘Item_Identifier’)

#Get a boolean variable specifying missing Item_Weight values
miss_bool = data[‘Item_Weight’].isnull()

#Impute data and check missing values before and after imputation to confirm
print (‘Orignal #missing: %d’% sum(miss_bool))
data.loc[miss_bool,‘Item_Weight’] = data.loc[miss_bool,‘Item_Identifier’].apply(lambda x: item_avg_weight.loc[x].values[0])
print (‘Final #missing: %d’% sum(data[‘Item_Weight’].isnull()))


#7

This has worked for me :slight_smile: