Can anybody suggest good learning materials for Text/Document classification. I want to start from scratch


This is a good document to start with if you are looking to implement text classification in R.

This article will give you a detailed overview of the text classification / clustering algorithms.

@debarati_dutta8 I know this isn’t related to the current topic but can u help me with finding large datasets for binary classification.


do you need text for example 160 Mbyte of Twitter more than 100.000 twitts?


Thanks for replying @Lesaffrea I was thinking of structured data in the form of a table for binary classification (like the adult/titanic dataset but with more data), also I need another dataset for binary document classification.


What about 7000*90 ? binary output and the 89 mixe numeric and binary?


@Lesaffrea sorry I didn’t get that


if a dataset of 7000 observation by 89 features is good enough?



@Lesaffrea The adult dataset has 48842 observation & 14 features so I am looking something larger than that. Also I wanted Email classification dataset (like fraud / NotFraud)


