Document Classification



Can anybody suggest good learning materials for Text/Document classification. I want to start from scratch


Hi @Akash_Haldankar,

This is a good document to start with if you are looking to implement text classification in R.

This article will give you a detailed overview of the text classification / clustering algorithms.

Let me know if this helped.



Hi @debarati_dutta8

Thanks for the survey article.

Have a good day



Thank you so much Alain.



Thank you so much @debarati_dutta8


@debarati_dutta8 I know this isn’t related to the current topic but can u help me with finding large datasets for binary classification.


Hi @Akash_Haldankar
do you need text for example 160 Mbyte of Twitter more than 100.000 twitts?


Thanks for replying @Lesaffrea I was thinking of structured data in the form of a table for binary classification (like the adult/titanic dataset but with more data), also I need another dataset for binary document classification.


What about 7000*90 ? binary output and the 89 mixe numeric and binary?


@Lesaffrea sorry I didn’t get that


Hi @Akash_Haldankar
if a dataset of 7000 observation by 89 features is good enough?



@Lesaffrea The adult dataset has 48842 observation & 14 features so I am looking something larger than that. Also I wanted Email classification dataset (like fraud / NotFraud)


Hi @Akash_Haldankar

Sorry Askash for binary classification I do not have

Good luck