我如何使用rvest从雅虎财经抓取完整的数据集

Question

我试图通过网络从雅虎财经获取比特币历史数据的完整数据集scraping，这是我的第一个选项代码块：

library(rvest)
library(tidyverse)

crypto_url <- read_html("https://finance.yahoo.com/quote/BTC-USD/history?period1=1480464000&period2=1638230400&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true")
cryp_table <- html_nodes(crypto_url,css = "table")
cryp_table <- html_table(cryp_table,fill = T) %>% 
  as.data.frame()

我提供给 read_html() 很长一段时间的链接已经被选中，但是它只获取前 101 行，最后一行是你继续滚动时收到的加载消息，这是我的第二次拍摄，但是我也有同样的想法：

col_page <- read_html("https://finance.yahoo.com/quote/BTC-USD/history?period1=1480464000&period2=1638230400&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true")
cryp_table <- 
  col_page %>% 
  html_nodes(xpath = '//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[2]/table') %>% 
  html_table(fill = T)
cryp_final <- cryp_table[[1]]

如何获取整个数据集？

Answer 1

我想你可以获得下载链接，如果你查看网络，你会看到下载链接，在这种情况下：

“https://query1.finance.yahoo.com/v7/finance/download/BTC-USD?period1=1480464000&period2=1638230400&interval=1d&events=history&includeAdjustedClose=true”

嗯，这个链接看起来像网站的url，也就是说，我们可以修改url链接来获取下载链接并读取csv。看代码：

library(stringr)
library(magrittr)

site <- "https://finance.yahoo.com/quote/BTC-USD/history?period1=1480464000&period2=1638230400&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true"

base_download <- "https://query1.finance.yahoo.com/v7/finance/download/"

download_link <- site %>% 
  stringr::str_remove_all(".+(?<=quote/)|/history?|&frequency=1d") %>% 
  stringr::str_replace("filter", "events") %>% 
  stringr::str_c(base_download, .)

readr::read_csv(download_link)

Answer 2

有没有报错：Error in open.connection(3L, "rb") : HTTP error 403.

我如何使用rvest从雅虎财经抓取完整的数据集

问题描述投票：0回答：2

2个回答

最新问题

我如何使用rvest从雅虎财经抓取完整的数据集

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2