使用rvest和R的Web抓取html

问题描述 投票:0回答:1

我想将该网站https://www.askramar.com/Ponuda抓取到网上。首先,我应该抓取指向每个汽车页面的所有链接。扩展链接在html结构中如下所示:

enter image description here

我尝试了以下代码,但在R中得到了一个空对象:

url <- "https://www.askramar.com/Ponuda"
html_document <- read_html(url)


links <- html_document %>%
  html_nodes(xpath = '//*[contains(concat(" ", @class, " "), concat(" ", "vozilo", " "))]') %>%
  html_attr(name = "href") 

是网页上的javascript吗?请帮忙!谢谢!

r web-scraping web-crawler phantomjs rvest
1个回答
0
投票

[是,该页面使用javascript来加载您感兴趣的内容。但是,它只是通过向https://www.askramar.com/Ajax/GetResults.cshtml调用xhr GET请求来完成此操作。您可以这样做:

url <- "https://www.askramar.com/Ajax/GetResults.cshtml"

html_document <- read_html(url)

links <- html_document %>%
  html_nodes(xpath = '//a[contains(@href, "Vozilo")]') %>%
  html_attr(name = "href")

print(links)
© www.soinside.com 2019 - 2024. All rights reserved.