Welcome to Practice Problem : Twitter Sentiment Analysis



Welcome to Practice Problem : Twitter Sentiment Analysis

This will be the official thread for any discussion related to the practice problem. Feel free to ask questions, share approaches and learn.


where can I get the data?
Under data file -> No link for test_tweets.csv and train.csv.


Hey @jamilur,

You will find the links to download data just below the Evaluation Metric section.

Sanad :slight_smile:


In the training sample, all the tweets are labeled as 1. Why is this so?



Kindly check again, because the training file contains labels having both 0 and 1.


Thanks NSS, This is the classification report (testing 80:20 from train ) on testing set that I am getting will you please guide me how to improve this. I am unable to use NLTK/spacy this is just by scikitlearn.
precision recall f1-score support

      0       0.96      0.99      0.98      7433
      1       0.87      0.47      0.61       558

avg / total 0.95 0.96 0.95 7991


So far I can say that it may be cos of the imbalanced data ,correct me if wrong?. Also share some link to teat the imbalanced data in text analytics.




I would need help here to write the R/python code. Earlier I have done the regression or classification problems with the categorical/numerical attributes.But this time attribute itself with twitter comments. It has noise (punctuation,hashtags and stop words) etc. Do we need to Remove stop words using NLP techniques and do the classification.
Could you please share the code.