How to interpret margin plot in R?

r
datavisualization

#1

Hello

Can someone help me decipher this margin plot?

This plot is based on Titanic data set. After exploring this data, I decided to plot age vs cabin, since they have the highest number of missing values. After I installed VIM package:

library(VIM)
marginplot(train[c(6,11)], col = c('Blue','Purple'))

I tried to dig deeper by checking its documentation, I came to know that Blue color represents the available data while purple represents missing data. Yet, I am unable to form a story from this plot.

Along with margin plot, I tried aggr plot as well, which looks like this:

aggr1_plot <- aggr(train, col = c('Blue','Purple'), numbers = TRUE, sortVars = TRUE, cex.axis = .8, gap = 3, ylab = c('Histogram of Missing Data','Pattern'))

For some reason, in this aggr plot, variable name ‘age’ is missing on x axis.

Please help me decipher the margin plot and pattern(aggr plot).

Regards
Manish


#2

In the first plot, you can see that the missing values of both the variables cabin and age are scattered i.e. there aren’t any clusters in which the data are missing.
In the second:
Left plot region: A barplot with the proportion of missing or imputed values in each variable.
Right plot region: An aggregation plot, showing all existing combinations of missing (red), imputed
(orange) and observed (blue) values. Additionally, the frequencies of different combinations
are visualized by a small barplot and by the number of their occurrence on
the right side.

I have found this document on cran (https://cran.r-project.org/web/packages/VIMGUI/vignettes/VIM-Imputation.pdf) very helpful for understanding the VIM package.


#3

Though, your answer has erupted many questions in my mind, I’ll first check the link that you’ve shared and discuss further. Thanks @suhalivyas.