I am implementing the LDA on Incident Ticket Description. I am using R
My approach is following:
.csv > corpus > remove( punc, stop words, numbers, tolower etc) > stemming > dtm > find no of topics ( k) using hmean > apply topicmodelling:: lda on dtm > checknig the topics and their terms > visualize usnig LDAVis.
Now my question are:
1. I have many words being repeated in other topics , so how interpret it and how to remove this correlation ?
2. How to give names to topics using the text ?
3. how to check accuracy of topic modelling and how to test in on TEST data set?
4. can I apply SVM ,NB, Xgboost etc on output of LDA for classification of new incident ticket ?
5. how to deploy it to server such that I can see my model working in real world ?
Has anyone hear of TWC-LDA, NMF, T-SNE implementation in R.
Kindly answer each point with approach/code in R.
Since it is live project that I am working on so appreciate the ASAP reply.