Anomaly detection on large time series dataset



How can one perform anomaly detection on a time series dataset which has a size greater than a local machine’s RAM?
The data in question is in the form of a jsonl file, is ~20 GBs (my machine’s RAM is 16 GB) and has multiple attributes. I am looking for approaches that one might take when performing such a task.


You might want to pass in only the part of the stream with a fixed time window to:

AnomalyDetectionVec(ts_as_a_vector, max_anoms=0.02, period=60, direction='both', only_last=FALSE, plot=TRUE, only_last=T)

Setting only_last argument to true will only check if the latest point in the time window is an anomaly or not.

You can find some examples for anomaly detection here.