Hello ,

So I have been studying about time series analysis and after going through the available resources proceed to by applying to test my learnings by taking part in the time series problem in AV (here )

However I seem to be way off the track but would like to understand if I am proceeding in the right direction:

The data consists of the hourly data for the period 2012 to 2014 for the hits on the fictional website

My first step would be to visualise the data and what I observed was a high variance in the value of data so proceeded by taking a weekly resample based on mean and forward filling if there were any missing values

The initial graph indicates an upward trend and the results of the ADF test confirm the series non-stationary:

Results of Dickey-Fuller Test:

Test Statistic 1.237717

p-value 0.996236

#Lags Used 8.000000

Number of Observations Used 144.000000

Critical Value (5%) -2.881829

Critical Value (1%) -3.476598

Critical Value (10%) -2.577589

dtype: float64

I then proceeded by removing any outlier values if present

df[‘Count’] = df[‘Count’].clip(df[‘Count’].quantile(0.001), df[‘Count’].quantile(0.999))

Also as there seems to be a vast difference in the variation I applied log values to even out the fluctuations:

plot of log values

This does not detrend the series so took first difference of log

The ADF test results after this is as follows:

Results of Dickey-Fuller Test:

Test Statistic -5.005418

p-value 0.000022

#Lags Used 12.000000

Number of Observations Used 139.000000

Critical Value (5%) -2.882568

Critical Value (1%) -3.478294

Critical Value (10%) -2.577983

dtype: float64

with the above results i was confident that my series is stationary and can now proceed with the model building

The ACF and PACF plot results are then as follows:

plot_pacf(ts_week_log_diff,lags=5)

plot_acf(ts_week_log_diff,lags=5)

pyplot.show()

Now based on the graphs I take them to be order 1 each and fir an ARIMA model

The result summary is as follows:

```
ARMA Model Results
==============================================================================
Dep. Variable: Count No. Observations: 152
Model: ARMA(1, 1) Log Likelihood -149.446
Method: css-mle S.D. of innovations 0.644
Date: Fri, 15 Dec 2017 AIC 306.893
Time: 16:39:35 BIC 318.988
Sample: 01-22-2012 HQIC 311.806
- 12-14-2014
===============================================================================
coef std err z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------
const 0.0210 0.010 2.055 0.042 0.001 0.041
ar.L1.Count -0.0495 0.105 -0.472 0.638 -0.255 0.156
ma.L1.Count -0.8020 0.072 -11.200 0.000 -0.942 -0.662
Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
AR.1 -20.2032 +0.0000j 20.2032 0.5000
MA.1 1.2469 +0.0000j 1.2469 0.0000
-----------------------------------------------------------------------------
```

The final predictions are way off the mark and am not entirely sure where I am making a mistake

I am learning to to this analysis and any directions on this will be highly appreciated