Web Scrapping from AMAZON.IN in R

r
data_science
webscraping

#1

I’m trying to scrape reviews from Amazon using the following code in R (using CSS selector gadget)
library(rvest)
library(XML)

Amazon Reviews

aurl <- “https://www.amazon.in/Apple-MMGF2HN-13-3-inch-Integrated-Graphics/product-reviews/B01FUK9TKG/ref=cm_cr_arp_d_paging_btm_2?showViewpoints=1&pageNumber=
amazon_reviews <- NULL
for (i in 1:10){
murl <- read_html(as.character(paste(aurl,i,””)))
rev <- murl %>%
html_nodes(".review-text") %>%
html_text()
amazon_reviews <- c(amazon_reviews,rev)
}
write.table(amazon_reviews,“apple.txt”)

I’m running the loop to extract from page 1 to 10, but the output is coming as page 1 pasted 10 times and not reviews from page 1 to 10.


#2

Hi,

I think the issue is with the link you are using for the pages. Looking at the source code, each page is beign redirected using the following url.

base url > https://www.amazon.in

/Apple-MMGF2HN-13-3-inch-Integrated-Graphics/product-reviews/B01FUK9TKG/ref=cm_cr_arp_d_paging_btm_2?ie=UTF8&pageNumber=2

hope this helps