Text based prediction


I have last 2 years of data with component and headlines. Unique 2035 components.

I need to build a prediction model which will learn from the 2 years data and then predict which component a new entered line or text will point to.

Can someone point me to the tools and methodologies that I can focus in R for this?



you can use a classification algorithm for this. Also you need to use some NLP techniques like n grams. In python I would use CountVectorizer for tokenization then fit this data to a classification algorithm. Take this with a pinch of salt as I am still a noob.