Importing HTML values to R


#1

Hello Friends,

I am just trying to read values from HTML files using R.

I have around 74 tags like below in my input XML. So iam expecting to read all the 74 names to a character vector.

Manikandan

Below is which i tried in R:

library(rvest)

input_file <- read_html(“C:/Users/MANI/Desktop/Yammer _ Data Scientists.html”)

user_name <- input_file %>% html_node(“div h3 span”) %>% html_text()

user_name

Output:

library(rvest)

input_file <- read_html(“C:/Users/MANI/Desktop/Yammer _ Data Scientists.html”)

user_name <- input_file %>% html_node(“div h3 span”) %>% html_text()
Warning message:
In node_find_one(x$node, x$doc, xpath = xpath, nsMap = ns) :
74 matches for .//div/descendant-or-self::/h3/descendant-or-self::/span: using first

user_name
[1] " Manikandan "

When i tried to print the vector , it just print the first name alone even though the output says 74 matches.

Anyone please let me know where i need to focus to get printed all the names.

Thanks,
Mani N


#2

Hi All,

I can resolve the issue by using the Selectorgadgets.

https://cran.r-project.org/web/packages/rvest/vignettes/selectorgadget.html

And i modified the code to below and its working fine

library(rvest)

input_file <- read_html(“C:/Users/MANI/Desktop/Yammer _ Data Scientists.html”)

To get users list of who posted in group

user_name <- html_nodes(input_file,".yj-thread-starter .yj-message-list-item–body-byline-user-link .yj-hovercard-link–name")

html_text(user_name)

Thank you


#3

Hello, I want something opposite.

Instead of getting all the values in html_node(“h3 span”) , I need just 1 say the 4th value.

Please suggest me?


#4

Thanks, I get the solution.

I used
library(magrittr).
GGFA0001 %>%
html_nodes(“h3 span”) %>% html_text() %>% extract2(4)`