Which regression algorithm could be applied for correcting sensor values?

Please consider the sample dataset below.

In simple terms,
Sensor is defective and hence measured incorrect values since 2000 and we have the data for 10 years with both: measured and actual.

P.S. Although we dont have data for each combination of the application and sensor type on monthly basis.

Now, we want to have the actual from the algorithm for actual values.

We tried, XGBoost and CatBoost by creating another column named diff = measured- actual
and fed to the algorithm to identify the pattern. but not sure which algorithm is appropriate although suspecting Neural network or Time series (ARIMA) could work but not sure
because we have just 10 years data on monthly level

library(tidyverse)

train_data <- data.frame(
  time = c(rep("01.2000",10),rep("02.2000",10),rep(".",3),rep("11.2010",10),rep("12.2010",10)),
  application = c(rep("factory",4),rep("residential",3),rep("research",3),
                  rep("factory",2),rep("residential",5),rep("research",3),
                  rep(".",3),
                  rep("factory",2),rep("residential",2),rep("research",6),
                  rep("factory",7),rep("residential",1),rep("research",2)),
  sensor = c(LETTERS[1:10],LETTERS[10:1],rep(".",3),LETTERS[c(5:1,10:6)],LETTERS[c(3:9,2,1,10)]), 
  measured = c(26.4,2000,1001,23.9,100000,0,1234,12098,34567,0,
               123,676,12,0,100,0,0,98,1,190,
               rep(".",3),
               3454,0,101,9,1,0,14,1298,677,0,
               264,20220,1851,3.9,1044,0,1764,0,34,0),
  actual =  c(26.4,2010,1001,23.9,100100,237,1234,12098,34567,19583,
              123,706,1112,156,100,650,109,98,10,190,
              rep(".",3),
              3454,10,101,19,10,40,44,1298,760,50,
              264,20220,1851,39,1048,870,1765,40,35,1110)
)

# to forecast actual 
test_data <- data.frame(
  time = rep("01.2011",10),
  application = c(rep("factory",7),rep("residential",1),rep("research",2)),
  sensor = LETTERS[c(1,4,5,9,3,2,8,6,7,10)], 
  measured = c(26.4,100000,0,0,
               123,12,
               3454,0,20220,1851)
)

How can we predict/forecast the actual values for 01.2011 data (test_data) ?

I suppose,
A fine tuned Regression model will work for you. But, If you want better accuracy and somewhat near data; Then you must first treat this as a whole Time series problem.
I can say that with my experience, I had 83% accuracy when I was solving through Normal Regression problem. But, It improved to ~94% when i solved it via Time series approach ( considering moving averages, etc) on same regression model.

Yes, Treating model as time series forecast problem statement, and applying ARIMA or LSTM, will give you better accuracy. But, You can always give regressive models a try, at least for experimenting.

1 Like
© Copyright 2013-2019 Analytics Vidhya