Incremental updates to a dataset - (Automated Data Preparation Phase)



Hello Gurus,

I’m working on a project where the data set I use to generate predictive analytics grows over time with new observations added. So, I need to keep my analytics up to date and consider the new observations in the data set, seeking accurate prediction results.

So, the data preparation phase mostly is guided process and highly iterative, in short, difficult to be automated.

Appreciate your views on how to develop a proper approach for the same. Do I still under the data warehousing mentality and bringing the view maintenance problem into the data science/analytics domain? or this is still a real challenge in developing packaged data science solutions which consider data set growth to regenerate the predictions?



I understand that i shouldn’t leave data set grows indefinitely, at some point I need to use a moving window to capture a subset of the data.