我试图通过网络从雅虎财经获取比特币历史数据的完整数据集scraping,这是我的第一个选项代码块:
library(rvest)
library(tidyverse)
crypto_url <- read_html("https://finance.yahoo.com/quote/BTC-USD/history?period1=1480464000&period2=1638230400&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true")
cryp_table <- html_nodes(crypto_url,css = "table")
cryp_table <- html_table(cryp_table,fill = T) %>%
as.data.frame()
我提供给 read_html() 很长一段时间的链接已经被选中,但是它只获取前 101 行,最后一行是你继续滚动时收到的加载消息,这是我的第二次拍摄,但是我也有同样的想法:
col_page <- read_html("https://finance.yahoo.com/quote/BTC-USD/history?period1=1480464000&period2=1638230400&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true")
cryp_table <-
col_page %>%
html_nodes(xpath = '//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[2]/table') %>%
html_table(fill = T)
cryp_final <- cryp_table[[1]]
如何获取整个数据集?
我想你可以获得下载链接,如果你查看网络,你会看到下载链接,在这种情况下:
“https://query1.finance.yahoo.com/v7/finance/download/BTC-USD?period1=1480464000&period2=1638230400&interval=1d&events=history&includeAdjustedClose=true”
嗯,这个链接看起来像网站的url,也就是说,我们可以修改url链接来获取下载链接并读取csv。看代码:
library(stringr)
library(magrittr)
site <- "https://finance.yahoo.com/quote/BTC-USD/history?period1=1480464000&period2=1638230400&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true"
base_download <- "https://query1.finance.yahoo.com/v7/finance/download/"
download_link <- site %>%
stringr::str_remove_all(".+(?<=quote/)|/history?|&frequency=1d") %>%
stringr::str_replace("filter", "events") %>%
stringr::str_c(base_download, .)
readr::read_csv(download_link)
有没有报错:Error in open.connection(3L, "rb") : HTTP error 403.