Text mining DTM

text_mining

#1

I have build a DTM matrix from a set of reviews which is like 650*1000 matrix. i.e. 650 words and 1000 reviews . I have attached labels to this DTM which are know prior.
Now I have trained this matrix with a classification algorithm.
I have a review now with 20 words.

How can I predict its class??.


#2

Hello @raviteja1993,

You need to have label data as target value for every instance (row) of your DTM matix and then apply machine learning models to it. Please elaborate more so I can guide you further. Thanks!


#3

@Shaz13 For example lets take a problem where I need to find the sentiment of a given review on a mobile network.
My problem is to build a model to which if we give a review as an input then we should get its sentiment as output…


#4

In that case you should proceed with data cleaning, removing stopwords, and then convert the text representation into vectors either using CountVectorizer or TFIDF Vectorizer. Later feed this to a model with proper labels (you need to have labels for the training data before hand)


#5

@Shaz13
Thank you,

Can you suggest a function in r similar to CountVectorizer.
I am trying to find a function or lib in r which can build with that efficiency in python


#6

Sorry @raviteja1993,

I am no expert in R. However, @pjoshi15 can guide you well here :slight_smile:


#7

@Shaz13 Thank you :grinning:


#8

@raviteja1993 I’d recommend you to use tidytext package to practice NLP in R.