Filling NaN values

conditional_format
big_mart_sales
nan

#1

I am doing the basic Data Exploration and have found out that
If Outlet_Type is Grocery store, it is always a small Outlet_Size.

I want to use this information to fill some of the Nan’s in Outlet_size where Outlet_type = Grocery store.

Can someone help me , how do I use if Condition in Fillna ?

Ps: I am working with python


#2

Ok I tried a diff approach, Since I didnt knew the conventional way.
What I tried was to create a Sub Dataframe of the original DF which contains only values where Outlet_type = Grocery Store.

Now this sub set only contained values NaN and SMALL for Outlet Size.
I tried to use the fill na simply by assignig all available Nan Values to Small.
It worked without inplace = True.
But now when I add inplace = True its throwing error.

IS this a valid way to fillna, or am I missing something here.

my Code :
Grocery = df.loc[df.Outlet_Type==‘Grocery Store’]
Grocery.Outlet_Size.unique()
Grocery.Outlet_Size.fillna(‘Small’, inplace=True)
Grocery.Outlet_Size.unique()

The Error I am getting :

SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._update_inplace(new_data)


#3

As mentioned in this article, you can do

#Impute the values:
df['Grocery'].fillna(mode(df['Grocery']).mode[0], inplace=True)

#4

True , but if u see, Outlet_size has a mode = Medium, but if you look at the data you see that all if the outlet_type is Grocery store, its Outlet _size is always small.

So if u just take the mode for all values, you are putting in wrong values.
SO how do we fillna for a subset of blank values where outlet_type is Grocery store,


#5

Hey @ashishpj,

I just mentioned a valid way to impute missing values.

If you want to know other techniques for it, you can refer “missing values treatment” topic in this article


#6

Hi Ashishpj,

I hope this will solve your problem

Lets think that your data set name is train

I am using numpy and Pandas to solve your problem

train['Outlet_Size'] = np.where(((train['Outlet_Size'].isnull()) & (train['Outlet_Type'] =="Grocery")),\
                                   "Small",train['Outlet_Size'])

#7

How to impute categorical variable in R using mode? Someone please help in filling missing values of “Outlet_Size” using Mode based on “Outlet_Type” in R


#8

Hi @subrato312, try the following code to impute the missing data with the mode of Outlet_Size.

missing_index = which(df$Outlet_Size == "")
for(i in missing_index){
  
  x = df$Outlet_Type[i]
  y = df %>% filter(Outlet_Type == x) %>% group_by(Outlet_Size) %>% summarise(count = n()) %>% top_n(1)
  df$Outlet_Size[i] = y$Outlet_Size
  
}