What are the ways of treatment of NA in Time series and Cross sectional data?

r
missing_values

#1

I am currently studying about the treatment of NA of a given variable and while studying it I found that the treatment of NA depends upon the data .I found that there are two type of data
Time series - A time series is a sequence of data points, typically consisting of successive measurements made over a time interval.
Cross Sectional – A cross-sectional study is a type of l observational study that involves the analysis of data collected from a population, or a representative subset, at one specific point in time.
I want to know what are the ways we can treat the NA values of the variable belonging to the two data type.


#2

@harry-
Missing value treatment is one of the critical aspects which impact your analysis to a great extent. The idea of treatment is based on a nearest neighbour approach where treatment should be done at most granular to a higher level. For example, if employee size is missing for a company of USA you can fill the data by taking averages of employee size of all USA based company or you can fill by simply taking an average of the variable. Treatment can vary based on if the variable is continuous or categorical, skewed or normally distributed, time series or cross-sectional and also on what type of modeling techniques you are intending to use.

Time Series: Here treatment should be done using moving averages of the continuous time period. For example, if data is missing for 2013 you can take an average of 2012 and 2014 to fill for 2013 or simply take moving average of 2010 to 2012 to fill for 2013. If data has seasonality you should try to capture it using averages of compounded growth rate.

Cross-Sectional: Usually cross-sectional data is treating with the mean for continuous variable and mode for a categorical variable. If the distribution of a continuous variable is skewed, it is better to treat it with a median.

Hope this helps!

Regards,
Hinduja


#3

You are right, these two types of data have to be treated differently.

Cross-Sectional
For cross sectional data inter-attribute-correlations need to be employed in order to estimate the missing data.
R Packages for this are: Amelia, Mice, VIM

Time Series
For time series inter-time-correlations need to be employed in order to estimate the missing data.
R Packages for this are: imputeTS

This paper ( http://arxiv.org/abs/1510.03924 ) about univariate time series imputation explains these differences a little bit more detailed in its introduction.