What are the ways and techniques to become good in Text Analytics?




What are the ways and techniques that we should use to become good in Text Analytics? What I know is that it is more of feature engineering and less of Machine learning skills. But what should be the way to tackle problems in text analytics?



There’s a course running right now on Coursera which you could join:

Its called “Text Mining and Analytics”. Its a free online course offered by University of Illinois


Hi Steve,

The best way to learn feature engineering is by practise. You can head over to https://datahack.analyticsvidhya.com/contest/all/practice and practise text problems as well as refer to relevant articles on our site regarding NLP. Thanks!


Hi Steve

In order to become GOOD at Text Analytics, work with some libraries like nltk, pandas, scikit learn etc and get a sense of which features work in practice like bag of words, POS taggers etc in some practice problems

To become GREAT at Text Analytics, start with a project, it can be something like Sentiment Analysis from Twitter Feeds

  1. Create API to collect twitter feeds on scale/ Get a dump of data from internet
  2. Use all features you can think on the REAL WORLD problem - whether bag of words, POS etc
  3. Try to label some tweets manually - appreciate that you can’t label a lot
  4. Try some other algorithms like Semi supervised algorithms
  5. Build some initial models
  6. Iterate over them again and again, see which segments are giving good results which arent
  7. Compare with other engines available, Research them for their features, models etc
  8. Try to incorporate the learnings in your product
  9. If it is giving great results, try creating APIs to sell the product
  10. Launch the product

After you have successfully developed a couple of decent engines in real world situations you can use that expertise in different problems