#1

The MSE is minimized according to the following equation:

The output of linear regression directly gives the value for which the above is minimum as far as I know. Then, what is the bias-variance trade-off? I mean, why is it a concern as long as the error is minimum?

#2

Bias - Variance trade off generally comes in play when you are under fitting or over fitting the problem.

Overfitting :-

• Problem
Suppose you have a training dataset and you have ran a linear regression which is giving you a R square value of 0.9 and a RMSE of 10. You ran the same model over your testing data set and found that your RMSE value shoots up to some 20! What is the problem here, well your model is overfitting the data! It has not learned but crammed your training data and now on a newer dataset it is failing to perform, it is somewhat similar like in childhood we crammed in mathematics book, now in exam if the question is same as in our textbook we will score 100% but if a similar question with new value comes, we fail to get it answered right.
So technically your model is covering most of the variance in training dataset but it is not generalize the solution.

• Remedy:
In this case you would bias your model by regularizing/penalizing your model coefficients while building your model on training datasets. Now what will happen ? You will have say R square value of 0.8 and RMSE of 15 on training dataset , but also on your testing dataset your will have RMSE as 15!. So by biasing your solution you have reduced the variance explained(R Square value) but was able to generalize the solution.

Underfitting :-
So same in case of underfitting , you have biased the solution too much that the variance explained by your model is too low, its similar to leaving some chapter only in exam, if you have not learnt it , how will you answer them in exam? (R square : 0.5 and RMSE : 20 on training and RMSE of 21 in testing). So you have to reduce the bias so that you make you model learn enough(R square: 0.8 and RMSE : 15 on training and RMSE of 15 on testing)to give some valid predictions on both training and testing data sets.