I have a time series data set where the target variable has data ranging from -3000000 to somewhere around 120000000 with lots of 0 values in between and with no proper pattern (or atleast not so obvious pattern). Because of this crazy range, the scaling/normalization methods are also not working properly and require me to predict the variable with upto 7 floating point accuracy.
My question is, is it suggested to use an ARIMA model for this data set ? This paper below doesn’t recommend ARIMA models for high volatile data.
Paper Name -
“Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation”
How do we proceed with cases like this ? Any guidance regarding this would be really helpful.
Are you talking about the superfamous Engle’s paper?
When you deal with this kind of series, you can model just the conditional variance but it is also possible model the mean, as in the ARIMA models, and the conditional variance of the errors. Some books of which a have good memories about these topics are:
Tsay: An Introduction to Analysis of Financial Data with R (2012)
Enders: Applied Econometric Time Series (2014)
and can be also useful for you.
However, the problem with zeroes you mentioned are not explored in those books and I suppose is a topic is out of the scope of most texts (i.e. a research problem). It is a topic treated using the zero-inflated and hurdle models usually for count data but you have a mixture of 0 and a continuous variable. I’m guessing now, but perhaps you should try to model the zeroes using a separate model (binary response, of course: 0 and non-0) and the non 0 part as a two part model: ARIMA + (G)ARCH for the residuals.