Exporting modified corpus with R

text_mining

#1

I wonder that this function take as input a set of text file tokenize each one and save it with thesame name… but it resave the same reading files with no modification

path <- ("C:/test/")
corp <- Corpus(DirSource(path), 
                 readerControl=list(reader=readPlain, 
                                    language='en_CA',
                                    load=TRUE));
crop <-lapply(corp, function(x) tokenize_ngrams(x, n = 6, n_min = 1))
writeCorpus(corp)

#2

@azza00 could you share a sample data which would work with your code? Can’t really suggest anything right now.


#3

Hi, this is my code…
library(“tm”)

path <- (“C:/Users/abidi/Desktop/testingSet/test/”)

corp <- Corpus(DirSource(path),

readerControl=list(reader=readPlain,

language=‘en_CA’,

load=TRUE));

corp <- tm_map(corp , content_transformer(tolower))

crop <-lapply(corp, function(x) tokenize_ngrams(x, n = 6, n_min = 1))

writeCorpus(corp)

when I apply any function on this corpus ( tolower, delete stopwords…) and export it, it works pretty well .

when I apply the tokenization function; if i print the corp on the Rstudio… it shows me what I want but when I export the corpus, i dont find the tokens tables shows before, i find the same corpus with it plain text

in bellow a sample of text i’am working on