Overfitting and underfitting

machine_learning

#1

Hi Experts,

Please kindly point me to code to track over fitting or under fitting for a machine learning algorithm and their troubleshooting techniques.

Regards,
Tony


#2

I don’t know if I understand your question, but I would suggest for you to plot the error in the test and train set. If the error in the train set is ok but the error in the test set is big, it’s overfitting. If the error in the train set and in the test set is big, you have an underfitting problem.


#3

Hi Andrei,

My question was how do we resolve over fitting and under fitting problems on the data.?

Regards,
Tony


#4

Sorry, now I understand. So, the answer is : depend. In general, overfitting is a consequence for wrong hyperparamethers selection, but sometimes the model that you selected is really prone to overfitting and you can’t do nothing. Underfitting have the same causes.

I recommend for you google about cross validation test, it is a tool to observe over and underfitting

You can post more about what is your problem, what is your algorithm, what kind of data do you have…


#5

Overfitting: The model learns well from the training data and results in good accuracy score. But performs poorly on the unseen data. Applying K fold cross validation technique helps in obtaining high performance on the test data also.
Underfitting: Poor performance on the training data and also on the unseen data. The model does not learn from the training data set. The solution is to add more features.


#6

Hi Malathi,

How does cross validation resolve the issue?

Please elaborate.
Thanks in advance.

Regards,
Tony


#7

This article may help you in understanding.


#8

Can you tell on how to find the error for train set?


#9

Hi @swarup17,

The train set is for training your model. If you want to check you performance using the train set, you can create a small hold-out sample (you do not train your model on this hold out sample). Once you train your model on the train set, you can check you model’s performance on the holdout sample.