如何使用 RSelenium r 包从“tablist”类的“div”标签获取谷歌地理坐标

问题描述 投票:0回答:1

我正在尝试使用

RSelenium
软件的
R
包中的函数获取 html 页面的地理坐标。目标是获得值 20º27'36.1"S 54º38'03.1"W。按照代码进行尝试。我很感激任何帮助。

library(rvest)
library(RSelenium)
library(httpuv)

port <- httpuv::randomPort()

rD <- rsDriver(browser = c("firefox"),
               verbose=TRUE,
               check = FALSE,
               port = port)

driver <- rD[["client"]]

urll <- "https://www.zapimoveis.com.br/lancamento/venda-apartamento-2-quartos-bairro-seminario-campo-grande-ms-46m2-id-2600496487/"
driver$navigate(urll)

politicas <- driver$findElement(using = "css",
                                value = "button.cookie-notifier__cta")
politicas$clickElement()

botaomapa <- driver$findElement(using = "xpath", "/html/body/main/div[1]/section/section/section[1]/button[2]")
botaomapa$clickElement()

#Attempt 1: using xpath from coordinates
coord <- driver$findElement(using="xpath", "/html/body/div/div/div/div[4]/div/div/div/div/div[1]/div")#errorrrrr

#Attempt 2: by botaomapa object
coord <- botaomapa$findElement(using="xpath", "/html/body/div/div/div/div[4]/div/div/div/div/div[1]/div")#errorrrr

#Attempt 3: by rvest package
readmap <- read_html(urll)
auxiliar <- readmap %>% html_elements("section")
auxiliar2 <- auxiliar%>%html_elements("#listing-map")
c1 <- readmap%>%html_nodes(xpath="/html/body/main/div[1]/section/section/section[2]/div/div[3]/article/iframe")#nothing
c2 <- auxiliar2%>%html_nodes(xpath="/html/body/main/div[1]/section/section/section[2]/div/div[3]/article/iframe")#nothing
c3 <- auxiliar2%>%html_nodes(xpath="/html/body/div/div/div/div[4]/div/div/div/div/div[1]/div")#nothing
r web-scraping rvest rselenium
1个回答
0
投票

棘手的一点是地图包含在 iframe 中,因此很难访问 iframe 中的任何内容。看起来您确实可以找到 iframe 及其属性! iframe 的

src=
属性中包含的链接包含坐标,因此您可以提取 iframe 链接,然后从中提取坐标。

在您的原始代码中执行此步骤后:

politicas$clickElement()

我这样做了:

library(stringr)
library(rvest)

# pull the webpage html
html <- driver$getPageSource()[[1]]



# look for the iframe's node
# then pull the source attribute
map_link <- html %>% 
  read_html() %>% 
  html_node(".map-embed__iframe") %>%
  html_attr("src")

这是链接的样子:

map_link
[1] "https://www.google.com/maps/embed/v1/place?key=AIzaSyB1BH90qSMLRWrSEKe8D7fml7-kWHN2qjY&q=-20.460039,-54.634191"

然后你可以使用正则表达式或任何东西来提取坐标

#remove everything before q=

map_link %>% str_remove(".*q=")
[1] "-20.460039,-54.634191"

这是我把这些坐标放在谷歌时看到的,所以它看起来和原始地图一样:

© www.soinside.com 2019 - 2024. All rights reserved.