Time series with multiple attributes and multiple groups

time_series
regression

#1

I am working on this dataset UK Traffic Dataset.

Here is my sample kernel :
my kernel on UK traffic data

This dataset consists of several groups and it has date and hour , as it is hourly time series data.

How can I go about time series regression on this kind of dataset , can anyone please explain ?

I am one-hot encoding the time . Is it correct ?


#2

Hi @shounakrockz47 ,

  1. The first step will be to combine the Date and hour column. You can simply use df['Date'] +df['hour'] .

  2. This column will have a dtype = object. Convert it into datetime format.

  3. You can use ARIMA to build a model. I have not implemented vector AR yet, but I guess it should work fine.

  4. Also, if you want to use simpler algorithms like linear regression or xgb, you can use the Date and hour column to create features like weekday or weekend, working or non-working hour etc. Then drop the datetime column and fit the model.

This training course migh help -


#3

Hi @AishwaryaSingh thanks for your reply.

I am grouping each unique entry in this way.
Screenshot%20from%202018-09-18%2022-57-01

So , now I am having multiple time series. How can I apply single ARIMA model on them ?

Can you please help ?


#4

Hi @shounakrockz47,

First of all, set the index of the Dataset as Date_Time. Secondly, ARIMA works on a univariate series. So you can either deal with each time series individually or use a different forecasting model. For multivariate time series, you can use VAR (Vector autoregression) model.


#5

hi @AishwaryaSingh , thanks for prompt reply.

Is this multivariate time series problem or multiple time series problem ?

because , if I group the unique Roads by Road_Identifier then this becomes multiple time series problem. Do you know any method to apply for multiple time series apart from applying each series individually ?


#6

Just to clarify, could you specify the difference between multivariate time series and multiple time series? As per my understanding, this looks like a multivariate series.

If I understand right, you are trying to group based on Road_Identifier. In that case, the number of rows would be same as the number of unique values in the Road_Identifier. The Date_Time column will no more represent a time series.

All the time series forecasting models (such as ARIMA, SARIMAX etc) use only univariate series as they make prediction based on the past values. Have a look at Vector AR and VARMA models.


#7

Hi @AishwaryaSingh , thanks again for prompt reply.

Just to clarify, could you specify the difference between multivariate time series and multiple time series? As per my understanding, this looks like a multivariate series.

As per my understanding multivariate timeseries is when we have multiple input variables to describe the target variable.

Whereas , I am having Date_Time as input variable and AMV as my target variable. So, isn’t this a univariate problem ? Only problem being , I am having multiple univariate time series .

Please let me know , if my understanding is wrong.


#8

That is correct. So you have a test dataset with the Road_Identifier column for the future time values?

A univariate time series would be the one which has a single variable. In this case, if you had only date_time and AMV, it would be a univariate problem.

The confusion is, when you say multiple time series, do you mean road_identifier and AMV as two time series? In that case, I assume you don’t have future values for AMV. Else, if you do have a test dataset which consists of future date_time and road_identifier value, you can use a simple ML model to make predictions.


#9

hi @AishwaryaSingh , thanks again for prompt reply.

I am thinking the problem in this way :

image

Is it multivariate time series ?


#10

@AishwaryaSingh , so if you ignore this Road_Identifier column for the time being , you can see , this is a multiple uni variate time series.

Can you please tell me how can I go about solving this problem ?

Thanks a lot for your insightful replies once again.
:slight_smile:


#11

Hi @shounakrockz47,

If you ignore Road_Identifier , you can use ARIMA model. You need to use Date_Time as index and then you will have only one column, which is AMV.


#12

Hi @AishwaryaSingh , but I am having multiple time series , right ? Won’t that be a problem ?

I don’t want to aggregate the values for same timestamp.


#13

To sum up, following are the things you can do with this dataset,

  1. Group w.r.t the road identifier - A, B, etc. Now, fit ARIMA for each group and make predictions. This will require you to fit the model multiple times (as many unique values in road_identifier you have)

  2. You can extract features from the date_time column and fit a simple random forest or xgb. In this case, you will have columns like road_identifier, day, month, year, hour, is_weekday, is_business_hour and one column for target. Repeat the same for future dates to create a test data and make predictions.

On a side note, the previous sample you provided for the dataset was mislieading becuase the concept of road_identifier wasn’t clear. Also, a really interesting problem and I’ve come across a problem like this for the first time. Thanks for this!


#14

hi @AishwaryaSingh , I am now doing the approach 1. Fitting the model multiple times.

Thanks for your help. Will let you know in case of any issues. :slight_smile:


#15

How many unique values do you have in the road_identifier column?


#16

How many unique values do you have in the road_identifier column?

43338