Understanding the results of this ML project

machine_learning
data_science
python

#1

I ran my code and got the following result in python:

  1. For Naive Bayes

                   precision    recall  f1-score   support
    

    Less than 50k 0.98 0.85 0.91 93576
    More than 50k 0.24 0.72 0.36 6186

    avg / total 0.93 0.84 0.88 99762

  2. For XGBoost

                     precision    recall  f1-score   support
    
     Less than 50k       0.96      0.99      0.98     93576
     More than 50k       0.77      0.37      0.50      6186
    
     avg / total         0.95      0.95      0.95     99762
    

As can be seen the precision value of miniority class has increased to 0.77 which is what we wanted in this project I guess. What does the recall and F1-score indicate in this case in terms of predicting less than or more than 50k. I read the theoritical definition of the same from wiki but could not relate it specifically to this project. Can you please explain what each entry in this table means?
I got this result by using metrics.classification_report in Python.

Regards
Raju


#2

@pudkeaayush

Basic interpretation -
Precision - How many % of positive instances predicted by the model is actually correct.
Recall - How many % of positive instances out of total number of positive instances is caught by the model

So if precision of xgboost for more than 50k is 0.77, that means that out of 100 more than 50k instances predicited by your model 77% of them are actually more than 50k (correct).

Similarly, Recall of xgboost for more than 50k is 0.37, that means that out of 100 actual more than 50k instances in your data, the model is only predicting 37% of them as more than 50k.

F measure is a harmonic mean of precision and recall. So doesn’t have a exact interpretation.Formula -> https://en.wikipedia.org/wiki/F1_score

Hope this helps.

Regards,
Aayush