Hi AV users,
Let me give you a brief of what we are trying to achieve and the current methodology that we have. Kindly suggest alternatives.
We have POS (point of sales data) for a retail company at a store, item, week level. The most important datapoint is the quantity and sales for different items on a daily basis, which can be aggregated to weekly, monthly as we like. Most of the products (items) have promotions in different format (price reduction/clearance/ BOGO offers//Support features (like flyers etc). Suppose we have all the data consolidated as week_id, item_id, store_id, Region_id (several stores in one region), Category_id (several items in one category), promotion related data (flags or promo units/promo sales for an item).
Currently, we are building a model at region, category level ( because of lesser number of models to develop) and getting it for store | item level by looking at contribution of a store in a region/ item in a category as a percentage.
The technique that we are using is linear regression and the variables are different promotion variable flags (types), holiday_flag, pay_day_flag, temperature and other climate data, seasonality index etc. We are finding out both promo sales (keeping the coeff of promo related variables) and actual sales (all variables). We are finding the non promo baseline sales by subtracting these two.
IMO, this is not the right way to do this, since error in baseline estimates will be high. Also, baseline estimates will have high correlation with promotion. please suggest alternative ways to forecast non-promo baseline sales. The challenge is unavailability of continuos non promo time-series data . The objective of predicting the baseline sales is to get effectivess of promotions by attributing the difference of actual sales minus baseline sales to promotions.