How can I treat missing values of pandas dataframe and apply defined functions for other dataframes in python?




I am working on a problem and having missing values in dataframe for some of the variables like Age, Salary, Sales.

Now to treat these values I want to defined functions which takes dataframe as input, treats missing values and return the dataframe after treatment. Please help me with the methods to treat missing values and how can I apply defined function to any other dataframe for treatment missing values of similar type.

Please help
Thank you



There is multiple methods to treat missing values but before going for treatment we should look for reasons behind these missing values. It could be due to error in data extraction, random (all observation has similar probability of getting missing values) or dependent on other variable. After this we can go with treatment methods like Deletion, Imputation and model building.

Deletion methods are used when the nature of missing data is “Missing completely at random” else non random missing values can bias the model output.

Imputation is a method to fill in the missing values with estimated values. The objective is to find relationships with valid values of the data set that will assist in estimating the missing values. It can be done using methods like mean, median, Mode and other imputation methods. Some time it is directly related with values of other variables, in these cases we uses their values to impute missing values.

Model Building is a method to fill in the missing values with estimated values and estimated values is predicted using another statistical model using non-missing values and other variables.


Recently I have worked on Titanic Data set and find missing values for Age. Here most common methods to replace missing values of Age is using mean imputation method.

meanAge = np.mean(df.Age)
df.Age = df.Age.fillna(meanAge)

You can also refer detail about “How to Treat Missing values” here and more examples of treating missing values in python here.