使用 rvest 下载信息

Question

我想用rvest包下载一个站点的信息。该信息包含在 HTML 层 div_class="col-sm-8" 下。我该怎么做？

我遵循的通常方法不起作用：

 url <- "https://www.assonime.it/attivita-editoriale/Pagine/pubblicazioni.aspx"

    pagina <- read_html(url)


    titoli <- pagina %>%
      html_nodes("col-sm-8") %>%
      html_text()

Answer 1

此页面的内容是用 javascript 渲染的，并且

read_html()

确实如此不执行任何 JavaScript。你要么必须使用抓取技术在无头浏览器中呈现整个页面（即

RSelenium

）或者您可以针对他们的 API 写入请求（即

httr

）。

rvest

的开发版本提供了

read_html_live()

，适用于这是我的。

注意：在你的代码中，你需要在 css 选择器前面加上

来告诉解析器查找该类的元素。

# Install the rvest dev version
#remotes::install_github("tidyverse/rvest")

library(rvest)

url <- "https://www.assonime.it/attivita-editoriale/Pagine/pubblicazioni.aspx"

# The first time I tried this, the page timed out, 2nd try worked for me.
pagina <- read_html_live(url)

titoli <- pagina  |> 
  html_elements(".col-sm-8")  |> 
  html_text()

titoli

使用 rvest 下载信息

问题描述投票：0回答：1

1个回答

最新问题

使用 rvest 下载信息

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1