Filling NaN values



I am doing the basic Data Exploration and have found out that
If Outlet_Type is Grocery store, it is always a small Outlet_Size.

I want to use this information to fill some of the Nan’s in Outlet_size where Outlet_type = Grocery store.

Can someone help me , how do I use if Condition in Fillna ?

Ps: I am working with python


Ok I tried a diff approach, Since I didnt knew the conventional way.
What I tried was to create a Sub Dataframe of the original DF which contains only values where Outlet_type = Grocery Store.

Now this sub set only contained values NaN and SMALL for Outlet Size.
I tried to use the fill na simply by assignig all available Nan Values to Small.
It worked without inplace = True.
But now when I add inplace = True its throwing error.

IS this a valid way to fillna, or am I missing something here.

my Code :
Grocery = df.loc[df.Outlet_Type==‘Grocery Store’]
Grocery.Outlet_Size.fillna(‘Small’, inplace=True)

The Error I am getting :

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation:


As mentioned in this article, you can do

#Impute the values:
df['Grocery'].fillna(mode(df['Grocery']).mode[0], inplace=True)


True , but if u see, Outlet_size has a mode = Medium, but if you look at the data you see that all if the outlet_type is Grocery store, its Outlet _size is always small.

So if u just take the mode for all values, you are putting in wrong values.
SO how do we fillna for a subset of blank values where outlet_type is Grocery store,


Hey @ashishpj,

I just mentioned a valid way to impute missing values.

If you want to know other techniques for it, you can refer “missing values treatment” topic in this article


Hi Ashishpj,

I hope this will solve your problem

Lets think that your data set name is train

I am using numpy and Pandas to solve your problem

train['Outlet_Size'] = np.where(((train['Outlet_Size'].isnull()) & (train['Outlet_Type'] =="Grocery")),\


How to impute categorical variable in R using mode? Someone please help in filling missing values of “Outlet_Size” using Mode based on “Outlet_Type” in R


Hi @subrato312, try the following code to impute the missing data with the mode of Outlet_Size.

missing_index = which(df$Outlet_Size == "")
for(i in missing_index){
  x = df$Outlet_Type[i]
  y = df %>% filter(Outlet_Type == x) %>% group_by(Outlet_Size) %>% summarise(count = n()) %>% top_n(1)
  df$Outlet_Size[i] = y$Outlet_Size


Maybe it’s kinda self-promotion but I deal with this in my (incomplete)kernel -->

Feel free to suggest anything :slight_smile:


Since outlets with missing size value belongs to ‘Tier 2’ and ‘Tier 3’ locations,
we will impute them with the mode of the corresponding locations.

tmpT2Outlets <- subset(bigmart, bigmart$Outlet_Location_Type == 'Tier 2')
tmpT3Outlets <- subset(bigmart, bigmart$Outlet_Location_Type == 'Tier 3')

bigmart$Outlet_Size[bigmart$Outlet_Size == "" & bigmart$Outlet_Location_Type == 'Tier 2'] <- "Small"
bigmart$Outlet_Size[bigmart$Outlet_Size == "" & bigmart$Outlet_Location_Type == 'Tier 3'] <- "Medium"

PS: Code segment is written in R