Extracting a paragraph using Text mining

text_mining

#1

Hi all,

I am a novice in text mining. I want to learn / know how to extract a paragraph from a very large document based on certain key words. Can some one help.

Thank you


#2

@karthe1

You mean to say that you have a document and you want to extract paragraphs which have a given set of key words?

What form is the document in? And is it group of keywords or things like order of keywords matter? If it is just set of keywords, then you can convert the document into frequency tables for each paragraph and then use it.

Regards
Kunal


#3

Hi @kunal sir,

The document is in text format. the key word is just a single word. Lets call the string as “Analytics”. I want to extract the text of all the paragraphs which has reference to the this string. I am using R.
I am trying with simple para annotator as of now. Not sure if this is correct.
Thank you.


#4

The following code is working. Use the read.csv with as.is=T and replace the ‘novice’ in str_extract_al with the respective keyword.

install.packages(‘stringr’)
library(‘stringr’)
write.csv('Hi all,

I am a novice in text mining. I want to learn / know how to extract a paragraph from a very large document based on certain key words. Can some one help.

Thank you’,‘text.txt’,row.names=F)
a<- read.csv(‘text.txt’,as.is=T)
str_extract_all(as.character(a),"\n\n.novice.\n\n")

Let me know if there is any problem. Also, this extracts the entire paragraphs with the given keyword but I don’t think it should work for the first and the last paragraph. You can write similar methods using str_extract_all.


#5

How to cluster or classify the resumes based on areas of expertise,experience etc? and also how to retrieve date of birth or age and years of experience automatically from thousands of resumes?


#6

Hi I am also facing same problem. Now if you know then can you please help me in it?
Thank you