点击按钮RSelenium Amazon Page Turn

问题描述 投票:0回答:1

我无法让Rselenium在我试图抓取的亚马逊评论部分翻页。以下是我的代码。我已经尝试过几乎所有CSS和xpath的组合。有什么想法吗?

       replicate(100,
          {
remDr$navigate("https://www.amazon.com/Eagles-Nest-Outfitters-DoubleNest-Portable/product-reviews/B00K30GXK8/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviewshttps://www.amazon.com/Eagles-Nest-Outfitters-DoubleNest-Portable/product-reviews/B00K30GXK8/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews")
webElem <- remDr$findElement("css", "body")
webElem$sendKeysToElement(list(key = "end"))
morereviews <- remDr$findElement(using = 'css selector', ".a-last a")
morereviews$clickElement()
Sys.sleep(4)

reviews <- xml2::read_html(remDr$getPageSource()[[1]])%>%
  rvest::html_nodes(".review-text")%>%
  dplyr::data_frame(reviews = .)
})
r web-scraping rselenium
1个回答
0
投票

在这种情况下,您不需要使用RSelenium但只需要使用rvest。首先,您可以直接阅读其中一个页面的评论。其次,请注意每次在“评论”部分翻页时,网址也会发生变化(实际上,它表示您正在查看的页码)。因此,您可以使用循环来更改网址并删除所有评论:

reviews <- lapply(1:100,
       function(i){
         url <- paste0("https://www.amazon.com/Eagles-Nest-Outfitters-DoubleNest-Portable/product-reviews/B00K30GXK8/ref=cm_cr_getr_d_paging_btm_next_",i,"?ie=UTF8&reviewerType=all_reviewshttps%3A%2F%2Fwww.amazon.com%2FEagles-Nest-Outfitters-DoubleNest-Portable%2Fproduct-reviews%2FB00K30GXK8%2Fref%3Dcm_cr_dp_d_show_all_btm%3Fie%3DUTF8&reviewerType=all_reviews&pageNumber=",i)
         xml2::read_html(url) %>%
           rvest::html_nodes(".review-text") %>%
           rvest::html_text() %>%
           dplyr::data_frame(reviews = .)
       })
(reviews <- do.call("rbind", reviews))
© www.soinside.com 2019 - 2024. All rights reserved.