This post is in response to (@PulkitS) Pulkit Sharma’s article on Faster R-CNN. I’m not understanding how can I check if the model is overffiting during training. I know the thoery, I just don’t know how to implement it. I have implemented the ‘data_gen_val’ validation data generator but I don’t know how to use it and which functions to use. Any suggestions?
Overfitting is the phenomenon of a supervised machine learning model not giving generalized results or predictions. It happens because model over-learns the nuances
of the training data. IN that process, it loses the flexibility to accommodate the learning from a data point which might occur in real life but is
not present in the test data. If the trained model is used to make prediction on such a data point, it gives wrong results.
In terms of bias-variance tradeoff, overfitting increases the model complexity and hence might improve the performance on one dataset
For this reason, an overfit model would give very accurate predictions on the train data but not on the test data. Here are the ways to check if a
model is overfitting.
Model Evaluation Metric (Accuracy, RMSE, MAE, MAPE, F1 score etc.) is significantly poor on test data compared to the train data. Test set should
come from a dataset which has not been used for training or better from the production scenario.
Run a k-fold (k=5 or 10) cross validation on the trained model and plot the Model Evaluation Metric for each fold against the fold number. If the
model is overfitted, the value of the Model Evaluation Metric would fluctuate over the folds/iterations.
Cross validation just creates several train-test partitions of the same dataset, trains the model on each train data and tests on each test set. So, we
have as many Model Evaluation Metrics as there are partitions (or folds or iterations).
Apart from these, there are certain steps which if taken during the model lead to over-fitting. For example,
- Using a very complex (high-degree with multiple transformations like square root, log , exponential etc) polynomial for fitting a polynomial regression
- Training any model on a very small dataset (<1000 data points)
- Using a high number of tree depth for a decision tree and for any tree based ensemble (especially bagging) methods like Random Forest
- Using a very small number like k=1 or 2 for k parameter in the k-nearest neighbor algorithm
- Using very high number of hidden layers or high number of nodes in a hidden layer in neural networks
A generic way to remove/reduce overfitting is by adding a regularization term to the cost function. The regularization term depends on the cost
function. Regularization term basically penalizes the model for becoming more complex and hence tries to keep it more generic.