Document Classification

text_mining

#1

Can anybody suggest good learning materials for Text/Document classification. I want to start from scratch


#2

Hi @Akash_Haldankar,

This is a good document to start with if you are looking to implement text classification in R.

This article will give you a detailed overview of the text classification / clustering algorithms.

Let me know if this helped.

Thanks,
Debarati.


#3

Hi @debarati_dutta8

Thanks for the survey article.

Have a good day

Alain


#4

Thank you so much Alain.

Cheers,
Debarati.


#5

Thank you so much @debarati_dutta8


#6

@debarati_dutta8 I know this isn’t related to the current topic but can u help me with finding large datasets for binary classification.


#7

Hi @Akash_Haldankar
do you need text for example 160 Mbyte of Twitter more than 100.000 twitts?
Alain


#8

Thanks for replying @Lesaffrea I was thinking of structured data in the form of a table for binary classification (like the adult/titanic dataset but with more data), also I need another dataset for binary document classification.


#9

What about 7000*90 ? binary output and the 89 mixe numeric and binary?
Alain


#10

@Lesaffrea sorry I didn’t get that


#11

Hi @Akash_Haldankar
if a dataset of 7000 observation by 89 features is good enough?

Alain


#12

@Lesaffrea The adult dataset has 48842 observation & 14 features so I am looking something larger than that. Also I wanted Email classification dataset (like fraud / NotFraud)


#13

Hi @Akash_Haldankar

Sorry Askash for binary classification I do not have

Good luck

Alain