MIssing at random and missing not at random


While handling missing values, I came across two types of missing values : missing at random and missing not at random. Can somebody please explain these two types of missing values or provide some resources and also how to tackle each type of missing values?


The values in a dataset could have two kinds of missing values

  1. Missing at random
  2. Missing Not at random.

If the data is missing at random, that is the set of missing values is not dependent on any other particular variable, we can conclude that the missing values can be eliminated or imputed using median.
If data is not missing at random, we can use modelling techniques like regression to impute the missing values, since they are dependent on other variables.

The factor whether the data is missing at random or not can be judged by analyzing the margin plot of the variable with missing values against other variables which could be dependent on it(based on the meaning of those variables). For example in the Titanic data set, it appears from the meaning of the variables, that Fare must be related with P-class.
The technique to analyze this information from the margin plot can be found in its documentation.

Hope this helped