My question is in which way this is considered as a regression problem? We do not have historical time-series sales data to predict the future ones. What we actually have is the annual sales of some products (the target variable) based on the other 11 attributes describing the product (for example the fat content, weight, visibility…) and the outlet (Outlet type, size, location…). In which sense are we solving a regression based prediction problem here?

# Difference between regression and time series

Hi @joudi, I think you are mixing up regression and time series.

If you have continuous target variable, then it is a regression problem. For instance, in bigmart sales we have the sales to predict, which is continuous. Hence this becomes a regression problem.

About time series, when the datapoints are time dependent, then it becomes a time series problem. A time series problem can be regression or classification.

Let me please ask another question. The normal case is that we use independent variables (in our case Outlet Id, size, location, item visibility… ) to predict the dependent variables (item outlet sales). But aren’t the those “independent” variables dependent on each other? For example, When we have OUT010 the size, type and location type will always be the same for the outlet with this Id. This decreases the number of independent variables related to the outlet from 4 to only 1 (because if we know the Id we know the rest). And given that item weight, fat content type and visibility are not so much significant to the sales. that leaves us with only Item MRP and Outlet Id as the actual independent variables. How can we interpret this?

Hi, let me take up the questions one by one

They can be depended on each other, they might have high correlation value. You need to determine this during the data exploration step. you need to deal with it accordingly, you cant drop all the variables as such.

But if you just drop these variables, you are missing out on a lot of information. The model should capture signals like higher weight with certain fat content has a higher price and so on. The model needs that information.