What is meaning of overfitting and underfitting?

overfitting

#1

Can anyone plz help me understand the concept of overfitting and underfitting in layman’s term?

i read many links in google still not able to understand it properly.

@Lesaffrea @kunal @shuvayan @Sunil


#2

While developing a model, our objective is to be able to predict / forecast the dependent variables. In scenarios where the accuracy of predicting is low, we say the model is underfitting and in cases where the accuracy of prediction is too high for the known cases we say the model is overfitting. Over fitting will arise in the cases where the test data is made by subsetting the train data only or where we have used combined results of several different models for prediction. Under fitting arises in case of not well defined models.


#3

Thanks @abhiranjan07.But can u help me understanding me this concept in a little non-technical way ? Is it possible ?

n btw how can we know the accuracy of prediction is too high ??


#4

Hi Rohit,

Suppose you have made a model to see how your sales is influenced by price of the commodity over a period of 1 year (Jan 2015 - Dec 2015). Thereafter you want to determine what would your sales for the month of January 2016 would be if you fix the price say at 100. You estimate your sales to be 1000. Then you compare this value with the actual sales for January 2016.
Scenario 1: Say your actual sales for Jan 2016 is 1001. In this case we may say that the model is a good fit as it is able to closely forecast the value of the sales for Jan 2016.

Scenario 2: Say your actual sales for Jan 2016 is 950. In this case we say the model is under fit as we are not able to predict the values for sales correctly.

Scenario 3: Say using the same 1 year model you test it with the sales values of December 2015. In this case you will deduce that the model is able to give you exact value of sales for December 2015. Then you say that the model is over fitted. The only draw back over here is that if you use this model to predict the values of say Jan 2016, then you might not get as good result as you have expected while testing the model on data for December 2015.

Though this is not the best example to explain you under fitting and over fitting as I have used only one data point for explanation But hope you get a rough idea of what under fitting and over fitting is.


#5

Hey @abhiranjan07 i understood scenario 1 and 2 both.Thanks :slightly_smiling:
Is there some correction in scenario 3 ? I coudnt understand that…

What I understood is :

If i create a model for Dec 2015 and try to predict for Jan 2016 using that model i may not get a good result wrt to what i got in dec 2015.Thats y the model is overfitted ? R u trying to say this ? IF not can u explain just scenario 3. Sry for the trouble :frowning:


#6

Hi Rohit,

Let me give you an analogy to explain overfitting and underfiting.

Overfited models are like Hardcore Technical Guys:

  • They know a lot about a particular field like a programmer
  • You ask them anything about coding (even in details), they’ll probably be able to answer you and that too pretty precisely.
  • But you ask them why the oil price fluctuate, they’ll probably make an informed guess and say something cranky

Parsimonious models are like the MBA guys

  • They have a broad understanding of a wide variety of subjects
  • They will be able to discuss with you on most topic relatively easily and can help you generally
  • They have a focus on width of knowledge rather than depth

Underfitted models are like those Engineers who wanted to be Writers but forced by their parents

  • They will neither know engineering nor writing
  • They never had their heart in what they did and have insufficient knowledge of everything

In terms of modeling, you can understand them as:
-Overfitting: too much focus on training set (engineering) and learns complex relations which may not be valid in general for new data (test set)
-Parsimonious: have learnt the relations in training data in optimum depth to be able to predict fairly well on test set.

  • Underfitting: too little focus on training set. Neither good for training not testing

Hope this helps.

Have a good day!
Aarshay


#7

Thanks a ton @Aarshay … Now i totally got the concept… Thanks alot :slight_smile: :slightly_smiling:

Perfect way of explaining the concept :smiley: :smiley:


#8

Over-fitting: stereotyping people based on the few examples you’ve met. :grin: :wink:


#9

Most welcome :slightly_smiling:

Glad you understood it !


#10

@Aarshay - Ur comparison of MBA/Engg guys was nice… I laughed reading it… :slight_smile:


#11

Perfect explanation…
The best i have seen so far…

Was very confused so far about this but you totally cleared it man…

Thanks…


#12

Hello @abhiranjan07, Can you please tell me know what are the reason for overfitting and underfitting?

Regards,
Prasant Sahoo


#13

Primary reasons for Under Fitting of model is the inability of model to capture all the variation in the dependent variable. This could arise due to following reason:

  1. Incorrect model structure, i.e. non inclusion of required explanatory variables
  2. Incorrect transformation of independent variable.

Over Fitting of model could be due to:

  1. Inclusion of redundant variables
  2. Over training of model in process of achieving 100 percent accuracy.

#14

Thanks lot @abhiranjan07