Multi-label text classification in Python / R


In our existing system, we have 11 different interest areas (like fashion, sports, food…).

Depending on text features, we need to tag interest areas and mark them. Depending on text, it may require tagging with multiple area tags. Overall, it looks like Multi-label text classification.

My question is how can I start. Please suggest me approach/test data for reference.



You can calculate the probability of each tag against all like one vs all method. Now, Choose top k tags based on probabilities.

Ankit Gupta


Hi @KumarP

To start with for this multi-class classification problem, you can use Naive Bayes which is simple to understand and implement. Despite its simplicity it is observed to have outperform complex algos. To get a good conceptual as well as implementation understanding, you can refer to the following guide: