The evaluation metric is F1-Score and not accuracy. Check F1-Score on the validation set.
I trained a logistic regression, with both BoW and TF-IDF. Then I optimized the parameters with gridsearch, and set the scoring method to F1-Score. I get F1-Score of 0.629 and accuracy of 95.7%. It looks really good, but my submission gets me a score of 0.12 ?!
I understand the evaluation is based on F1-Score, but should i optimize the models based on their F1-Score, or accuracy? Is that why my final result is weird?
Thanks for taking the time to reply! However I’m not sure I understand…
I used a test size of 0.25 and I performed the gridsearch optimization using the parameter “scoring=‘f1’”. The classifier F1-score is showing 0.629 (versus <0.55 before optimization).
So… how is it a case of overfitting? Should I use more than 0.25 in the train_test_split?
I’ve been trying to find the code for some of the top people in the leaderboard, but can’t find any to compare with mine! Do you know any user who shared the code?
Hello, so I’m just finishing up the " Comprehensive Hands on Guide to Twitter Sentiment Analysis with dataset and code" and in the end I got stuck on the line “prediction_int = prediction[:,1] >= 0.3 # if prediction is greater than or equal to 0.3 than 1 else 0”. Why do we pick 0.3 as the threshold? This does seem rather low and I can’t see where in the problem statement this is specified.
@jokezor feel free to use any value in place of 0.3 as the threshold value. You may also use ROC curve to arrive at the threshold value.
Just one thing,I thought that the threshold means that we are only 30% sure that each digit is classified correctly? Is it a typical value for such problems as the twitter sentiment analysis problem?