正在从Google评论获取数据

问题描述 投票:0回答:1

我正在尝试从Google评论中抓取数据(星级,评论,日期等)。

我试图改编我可以在网上找到的代码,但是在使其工作时遇到了问题。显然,R并没有设法从Google评论中获取信息,而仅返回前七个评论(显然这些是Google在不滚动的情况下显示的评论)

有人遇到过同样的问题吗?谢谢!

#install.packages("rvest")
#install.packages("xml2")
#install.packages("RSelenium")

library(rvest)
library(xml2)
library(RSelenium)

rmDr <- rsDriver(browser = "firefox")
driver <- rmDr$client
driver$navigate("https://www.google.com/search?client=firefox-b-d&q=emporio+santa+maria#lrd=0x94ce576a4e45ed99:0xa36a342d3ceb06c3,1,,,")

Sys.sleep(5)
webEle <- driver$findElement(using = "css",value = ".review-dialog-list")

for(i in 1 : 15){
  webEle$sendKeysToElement(sendKeys = list(key = "page_down"))
  Sys.sleep(2)
}

#loop and simulate clicking on all "click on more" elements-------------
webEles <- myclient$findElement(using = "css",value = ".review-more-link")
for(webEle in webEles){
  tryCatch(webEle$clickElement(),error=function(e){print(e)}) # trycatch to prevent any error from stopping the loop
}
pagesource= myclient$getPageSource()[[1]]
#this should get you the full review, including translation and original text-------------
reviews=read_html(pagesource) %>%
  html_nodes(".review-full-text") %>%
  html_text()

#number of stars
stars <- read_html(pagesource) %>%
  html_node(".review-dialog-list") %>%
  html_nodes("g-review-stars > span") %>%
  html_attr("aria-label")


#time posted
post_time <- read_html(pagesource) %>%
  html_node(".review-dialog-list") %>%
  html_nodes(".dehysf") %>%
  html_text()

收到以下错误消息:

> webEles <- myclient$findElement(using = "css",value = ".review-more-link")
Error in checkError(res) : 
  Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused
> for(webEle in webEles){
+   tryCatch(webEle$clickElement(),error=function(e){print(e)}) # trycatch to prevent any error from stopping the loop
+ }
<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused>
<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused>
<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused>
<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused>
<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused>
<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused>
<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused>
> pagesource= myclient$getPageSource()[[1]]
Error in checkError(res) : 
  Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused
r rvest rselenium
1个回答
0
投票

Official Google Maps(Google Places)API仅允许您获取给定地点ID的5条最新评论。另一方面,Google提供了Business API。为了获得访问权限,您需要是Google的注册企业并申请API访问权限。这不是那么快的过程。同时,市场上有许多旨在提供自己的API的新兴公司,使您可以从Google Maps,Yandex Maps,Foursquare,Yelp等不同来源获得评论。您可以尝试使用获取Google地图评论,例如Yandex地图评论(https://allreviews.app

© www.soinside.com 2019 - 2024. All rights reserved.