This category is created to ask questions, discuss the solutions and help each other to solve practice problem sentiment analysis.
In the training sample, all tweets are labeled as 1. Why is this so?
In the training data, several tweets that are denouncing racists are marked as hate speech, this seems to defeat the purpose of developing an algorithm based on this data.
I created a sparse matrix (31962,25000) of training data set using Count Vectorizer and trained the classifier.
Now, i want to predict it on test data to submit the solution but it is a sparse matrix of shape(171197,21192).
As per my knowledge, train and test matrix should be of similar order. How shall i proceed now?
Use the same instance that you have used for train. Example, for training data set you might have used-
vec = CountVectorizer()
Use the same “vec” without initializing it again for test data: ex-