How to delete tables and figures from a set of docx files using R

text_mining
r

#1

Is there any solution to parse a set of docx files one by one and delete all the tables and caption in. I tried to use the oficer package and i found this deleting code itwors with corsor to set it in the deleting place but it works just ith keywords, i couldn’t find a parameter for parsing tables or figures

library(officer)
my_doc <- read_docx(path = "ipsum_doc.docx")  %>% 
  cursor_reach(tkeyword="text  to delete") %>% 
  body_remove()

print(my_doc, target = "ipsum1_doc.txt")

#2

Hi @azza00

Instead of parsing tables and deleting them why not try parsing everything except tables. Just give it a try and let us know if it works.


#3

Hi
thank you for your respons but can you give me a pattern of code that I can work with