Hi @shan4224

Let us take an example

Our training data looks like this

Our test data looks like this

Now, when we use the training model on the test data set, this is what happens

```
data1 <- data.frame(x=seq(from=1,to=100),y=c(seq(from=1,to=30),seq(from=60,to=90),seq(from=1,to=39)))
row.names(data1) <- NULL
# Now let us take the first 60 entries as our training set
# and take the last 40 entries as our test set
train <- data1[1:60,]
test <- data1[61:100,]
# Let us choose the simplest Linear Regression Model
model <- lm(y~x,data=train)
test$predictedY <- (test$x * model$coefficients[[1]]) + model$coefficients[[2]]
```

**TEST 1**

the RED Line is the actual result, the BLUE line is the predicted result

**TEST 2**

To resolve this, let us say we include 20 more entries from the test data into the train data

```
train <- data1[1:80,]
test <- data1[81:100,]
```

A bit better

Now let us exclude the initial 20 30 points from the training data set, then we might get something different. So you can see how the results are changing. It is because we chose the 80:100 points of the dataset as a test data set. This choice of a set of later data points as a testset is called out of time validation and helps reduce bias in comparison to the model when we include all available data as training data and use the model on production data

Let me know if this helped

Regards,

Anant