如何从BigQuery加载大型数据集到R?

问题描述 投票:1回答:2

我用Bigrquery包尝试了两种方法

library(bigrquery)
library(DBI)

con <- dbConnect(
  bigrquery::bigquery(),
  project = "YOUR PROJECT ID HERE",
  dataset = "YOUR DATASET"
)
test<- dbGetQuery(con, sql, n = 10000, max_pages = Inf)

sql <- `YOUR LARGE QUERY HERE` #long query saved to View and its select here
tb <- bigrquery::bq_project_query(project, sql)
bq_table_download(tb, max_results = 1000)

但没有错误"Error: Requested Resource Too Large to Return [responseTooLarge]",可能相关的问题here,但我感兴趣的任何工具来完成工作:我已经尝试了解决方案概述here但他们失败了。

如何从BigQuery向R加载大型数据集?

r google-bigquery bigrquery
2个回答
0
投票

正如@hrbrmstr建议你的那样,the documentation特别提到:

> #' @param page_size The number of rows returned per page. Make this smaller
> #'   if you have many fields or large records and you are seeing a
> #'   'responseTooLarge' error.

在r-project.org的这个文档中,您将在the explanation of this function (page 13)中阅读不同的建议:

这将检索page_size块中的行。它最适合较小查询的结果(比如说<100 MB)。对于较大的查询,最好将结果导出到存储在Google云端的CSV文件,并使用bq命令行工具在本地下载。


0
投票

我刚开始使用BigQuery。我认为它应该是这样的。

可以从CRAN安装当前的bigrquery版本:

install.packages("bigrquery")

可以从GitHub安装最新的开发版本:

install.packages('devtools')
devtools::install_github("r-dbi/bigrquery")

用法低级API

library(bigrquery)
billing <- bq_test_project() # replace this with your project ID 
sql <- "SELECT year, month, day, weight_pounds FROM `publicdata.samples.natality`"

tb <- bq_project_query(billing, sql)
#> Auto-refreshing stale OAuth token.
bq_table_download(tb, max_results = 10)

DBI

library(DBI)

con <- dbConnect(
  bigrquery::bigquery(),
  project = "publicdata",
  dataset = "samples",
  billing = billing
)
con 
#> <BigQueryConnection>
#>   Dataset: publicdata.samples
#>   Billing: bigrquery-examples

dbListTables(con)
#> [1] "github_nested"   "github_timeline" "gsod"            "natality"       
#> [5] "shakespeare"     "trigrams"        "wikipedia"

dbGetQuery(con, sql, n = 10)



library(dplyr)

natality <- tbl(con, "natality")

natality %>%
  select(year, month, day, weight_pounds) %>% 
  head(10) %>%
  collect()
© www.soinside.com 2019 - 2024. All rights reserved.