What are some libraries in R similar to the BeautifulSoup package in Python?

r
nlp
python

#1

Hi all,

Can someone suggest some good libraries in R for Natural Language Processing tasks such as removing HTML tags and other cleaning of data similar to the BeautifulSoup package in Python?

Thanks!


#2

@Mark ,

You can use the scrapeR package in R for processing data from HTML and XML documents.

Look at the documentation here.

http://cran.r-project.org/web/packages/scrapeR/scrapeR.pdf

hope this helps!


#3

look at rvest package created by Hadley

rvest helps you scrape information from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup.

library(rvest)
lego_movie <- html(“http://www.imdb.com/title/tt1490017/”)

rating <- lego_movie %>%
html_nodes(“strong span”) %>%
html_text() %>%
as.numeric()
rating
#> [1] 7.8