Time Series Analysis - queries on practice problem



Hello ,
So I have been studying about time series analysis and after going through the available resources proceed to by applying to test my learnings by taking part in the time series problem in AV (here )

However I seem to be way off the track but would like to understand if I am proceeding in the right direction:

The data consists of the hourly data for the period 2012 to 2014 for the hits on the fictional website
My first step would be to visualise the data and what I observed was a high variance in the value of data so proceeded by taking a weekly resample based on mean and forward filling if there were any missing values

The initial graph indicates an upward trend and the results of the ADF test confirm the series non-stationary:

Results of Dickey-Fuller Test:
Test Statistic 1.237717
p-value 0.996236
#Lags Used 8.000000
Number of Observations Used 144.000000
Critical Value (5%) -2.881829
Critical Value (1%) -3.476598
Critical Value (10%) -2.577589
dtype: float64

I then proceeded by removing any outlier values if present
df[‘Count’] = df[‘Count’].clip(df[‘Count’].quantile(0.001), df[‘Count’].quantile(0.999))

Also as there seems to be a vast difference in the variation I applied log values to even out the fluctuations:

plot of log values

This does not detrend the series so took first difference of log

The ADF test results after this is as follows:

Results of Dickey-Fuller Test:
Test Statistic -5.005418
p-value 0.000022
#Lags Used 12.000000
Number of Observations Used 139.000000
Critical Value (5%) -2.882568
Critical Value (1%) -3.478294
Critical Value (10%) -2.577983
dtype: float64

with the above results i was confident that my series is stationary and can now proceed with the model building

The ACF and PACF plot results are then as follows:


Now based on the graphs I take them to be order 1 each and fir an ARIMA model

The result summary is as follows:

 ARMA Model Results                              
Dep. Variable:                  Count   No. Observations:                  152
Model:                     ARMA(1, 1)   Log Likelihood                -149.446
Method:                       css-mle   S.D. of innovations              0.644
Date:                Fri, 15 Dec 2017   AIC                            306.893
Time:                        16:39:35   BIC                            318.988
Sample:                    01-22-2012   HQIC                           311.806
 - 12-14-2014                                         
  coef    std err          z      P>|z|      [0.025      0.975]
const           0.0210      0.010      2.055      0.042       0.001       0.041
ar.L1.Count    -0.0495      0.105     -0.472      0.638      -0.255       0.156
ma.L1.Count    -0.8020      0.072    -11.200      0.000      -0.942      -0.662
 Real           Imaginary           Modulus         Frequency
AR.1          -20.2032           +0.0000j           20.2032            0.5000
MA.1            1.2469           +0.0000j            1.2469            0.0000

The final predictions are way off the mark and am not entirely sure where I am making a mistake
I am learning to to this analysis and any directions on this will be highly appreciated



To learn and forecast time series, you can refer this course:

This course covers most of the forecasting techniques, which will help you to get better predictions.