About the Practice_Sentiment category


#1

This category is created to ask questions, discuss the solutions and help each other to solve practice problem sentiment analysis.


#2

In the training sample, all tweets are labeled as 1. Why is this so?


#3

In the training data, several tweets that are denouncing racists are marked as hate speech, this seems to defeat the purpose of developing an algorithm based on this data.


#4

I created a sparse matrix (31962,25000) of training data set using Count Vectorizer and trained the classifier.
Now, i want to predict it on test data to submit the solution but it is a sparse matrix of shape(171197,21192).
As per my knowledge, train and test matrix should be of similar order. How shall i proceed now?


#5

Use the same instance that you have used for train. Example, for training data set you might have used-
vec = CountVectorizer()
vec.fit_transform(trainData)

Use the same “vec” without initializing it again for test data: ex-
vec.transform(testData)