Another query on text mining.
I found a piece of code that would help me in randomly splitting a data frame into 70%-30%. The code ran successfully.
dt=sort(sample(nrow(atac_raw),nrow(atac_raw)*.7)) atac_raw_train <- atac_raw[dt,] atac_raw_test <- atac_raw[-dt,]
However, when I use the same code to split the corresponding corpus data (corpus_clean), it fails. Maybe, the code doesn’t work on corpus data?
dt_corpus=sort(sample(nrow(corpus_clean),nrow(corpus_clean)*.7)) *Error in sample.int(length(x), size, replace, prob) : invalid 'size' argument
Can anyone help? Couldn’t find any solution on the web.
I can think of a workaround (jugaad!) by modifying the datafile in such a way so that I select the first n records as training and the remaining as my testing data. But, would like to know if there is a way to fix the code instead to make it work.