Text mining DTM



I have build a DTM matrix from a set of reviews which is like 650*1000 matrix. i.e. 650 words and 1000 reviews . I have attached labels to this DTM which are know prior.
Now I have trained this matrix with a classification algorithm.
I have a review now with 20 words.

How can I predict its class??.


Hello @raviteja1993,

You need to have label data as target value for every instance (row) of your DTM matix and then apply machine learning models to it. Please elaborate more so I can guide you further. Thanks!


@Shaz13 For example lets take a problem where I need to find the sentiment of a given review on a mobile network.
My problem is to build a model to which if we give a review as an input then we should get its sentiment as output…


In that case you should proceed with data cleaning, removing stopwords, and then convert the text representation into vectors either using CountVectorizer or TFIDF Vectorizer. Later feed this to a model with proper labels (you need to have labels for the training data before hand)


Thank you,

Can you suggest a function in r similar to CountVectorizer.
I am trying to find a function or lib in r which can build with that efficiency in python


Sorry @raviteja1993,

I am no expert in R. However, @pjoshi15 can guide you well here :slight_smile:


@Shaz13 Thank you :grinning:


@raviteja1993 I’d recommend you to use tidytext package to practice NLP in R.