使用compute在模式中存储新的永久表

问题描述 投票:0回答:3

我想使用

dbplyr
语法对某些表执行一些
JOIN
/
FILTER
操作并将结果存储回数据库而不先收集它。

从我读到的内容来看,

compute(..., temporary = FALSE, ...)
应该这样做,但是我很难为我想要存储的表提供完全限定的名称(即
database.schema.table_name

我知道

DBI::Id
dbplyr::in_schema
,但我不知道如何正确使用它们。尝试使用
sql
至少做了我想要的事情(创建了表格),但导致了(虚假?)错误。

我需要做什么?

一些NoReprex

library(DBI)
library(dbplyr)

con <- dbConnect(odbc::odbc(), "myserver")

## do __not__ collect the data
my_frame <- con %>%
   tbl(Id(catalog = "mydb", schema = "dbo", table = "mytable")) %>%
   inner_join(con %>% tbl(Id(catalog = "mydb", schema = "dbo", 
                             table = "yetanothertable")),
              "id")

compute(my_frame,
        # Id(catalog = "mydb", schema = "dbo", table = "mynewtable"), # (1)
        # in_schema("dbo", "mynewtable"),                             # (2),
        sql("mydb.dbo.mynewtable"),                                   # (3)
        FALSE)

根据我使用的变体,我会得到不同的错误

# (1)
## Error in h(simpleError(msg, call)) : 
##   error in evaluating the argument 'conn' in selecting a method for function 
## 'dbQuoteIdentifier': argument "con" is missing, with no default

# (2)
## Error in escape(x$schema, con = con) : 
##   argument "con" is missing, with no default

# (3)
## Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near ')'.  
##            [Microsoft][SQL Server Native Client 11.0][SQL Server]Statement(s) could not be prepared. 
## <SQL> 'SELECT *
## FROM (my.fully_qualified.name) "q02"
## WHERE (0 = 1)'

P.S.:我真的希望能够使用“完全”限定名称保存表,即包括数据库名称(尽管在这个简化示例中是相同的)。所以从长远来看,dbConnect(..., database = <somedb>)并不能解决我的问题。

P.P.S:我正在寻找

compute

解决方案。我知道我可以自己构建

SQL
,但我真的很想看看是否可以使用
dbplyr
抽象层。
    

r sql-server dbi dbplyr
3个回答
2
投票
this

答案)。但是当您在问题中排除这种方法时,我测试并找到了一种无需编写 SQL 即可完成此操作的方法。 我们将使用

db_compute

代替

compute

    compute
  • 文档指出“compute()将结果存储在远程临时表中”。所以我认为这意味着我们不能使用
    compute
    编写永久表。
  • db_compute
  • 文档说得很少。但它出现在
    db_copy_to
    旁边,其目的与我们正在寻找的类似。所以值得尝试(而且它有效)。
    
    
一般设置

library(DBI) library(dplyr) library(dbplyr) # connect to database connection_string = "..." db_connection = dbConnect(odbc::odbc(), .connection_string = connection_string) # remote table remote_table = tbl(db_connection, from = in_schema("schema","table")) top_rows = remote_table %>% head()

测试计算

top_rows %>% show_query() # <SQL> # SELECT TOP (6) * # FROM [database_name].[schema_name].[table_name] top_rows = top_rows %>% compute() # Created a temporary table named: #dbplyr_002 top_rows %>% show_query() # <SQL> # SELECT * # FROM #dbplyr_.002

所以我们可以看到
compute

写入了一个临时表。因此,如果我们进行一些复杂的处理(而不是只获取前几行)

compute
将是存储处理后的表的有效方法,这样我们就可以避免每次查询时重复复杂的处理。
但是因为它是临时的,当我们与数据库断开连接时,该表应该消失:

DBI::dbDisconnect(db_connection)

测试db_compute

out = db_compute( con = db_connection, table = in_schema("schema","new_table"), sql = sql_render(top_rows), temporary = FALSE ) out # <IDENT> database_name.schema_name.new_table # reconnect new_remote_table = tbl(db_connection, from = in_schema("schema","new_table"))

因此我们现在可以从 R 中访问新的(永久)表。我还使用 SQL 查询检查并确认该表存在于数据库中。

注意

,由于 db_compute 的文档很少,因此尚不清楚是否打算以这种方式使用。我已经测试了上面的内容并且它有效。但如果没有额外的文档,请自行承担使用风险。

    


2
投票
compute

,一切都会像魅力一样发挥作用。您只需提供数据库名称作为架构的一部分,并确保通过

sql
:
 进行转义
compute(my_frame, # in_schema("mydb.dbo", "mynewtable") would __not__ work in_schema(sql("mydb.dbo"), "mynewtable"), FALSE)



0
投票

事实证明,使用 pool::dbPool() 创建的连接无法计算,但使用 DBI::dbConnect() 创建的连接可以工作。

使用 pool 与 dbi 时工作/失败的示例代码

# libraries ----- library(dbplyr) library(DBI) library(pool) library(dplyr) library(odbc) # connect using pool and con ---- pool <- pool::dbPool(drv = odbc::odbc(), driver = "{ODBC17}", uid = Sys.getenv("AZURE_UID"), pwd = Sys.getenv("AZURE_PWD"), server = Sys.getenv("AZURE_SERVER"), database = Sys.getenv("AZURE_DATABASE"), Authentication = "ActiveDirectoryPassword", port = 1433, minSize = 0, ## default is 1 which often leaves connections hanging (not cool) idleTimeout = 600, ## 10 minutes until idle connection is closed, takes ~1 second to reopen encoding = "UTF-8", encrypt = "yes") con <- DBI::dbConnect( odbc::odbc(), Driver = "{ODBC Driver 17 for SQL Server}", Server = Sys.getenv("AZURE_SERVER"), Database = Sys.getenv("AZURE_DATABASE"), UID = Sys.getenv('AZURE_UID'), PWD = Sys.getenv('AZURE_PWD'), Port = 1433, Authentication = "ActiveDirectoryPassword", Encrypt = "yes" ) # upload iris to synapse ---- #DBI::dbWriteTable(con, DBI::Id(schema = "my_schema", table = "iris"), value = iris, overwrite = TRUE) # slow but works # POOL --this doesnt work ----- pool_iris <- tbl(pool, dbplyr::in_schema(sql("my_schema"),"iris")) pool_query <- pool_iris %>% head(10) compute(pool_query, dbplyr::in_schema(sql("my_schema"),"iris_computed_from_pool"), temporary = FALSE) # Error in UseMethod("sql_escape_ident") : # no applicable method for 'sql_escape_ident' applied to an object of class "c('Pool', 'R6')" # DBCONNECT - this works (same code) ----- con_iris <- tbl(con, dbplyr::in_schema(sql("my_schema"),"iris")) con_query <- con_iris %>% head(10) compute(con_query, dbplyr::in_schema(sql("my_schema"),"iris_computed_from_con"), temporary = FALSE) # Source: table<my_schema."iris_computed_from_con"> [10 x 5] # # Database: Microsoft SQL Server 12.00.2531[[email protected]@sqlserversqldatabaseXXXX/sqldatabaseXXXX] # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # <dbl> <dbl> <dbl> <dbl> <chr> # 1 4.6 3.1 1.5 0.2 setosa # 2 5.5 3.5 1.3 0.2 setosa # 3 4.9 3.6 1.4 0.1 setosa # 4 6.4 3.1 5.5 1.8 virginica # 5 4.4 3.2 1.3 0.2 setosa sessioninfo::session_info() # ─ Session info ─────────────────────────────────────────────────────────────── # setting value # version R version 4.0.2 (2020-06-22) # os OpenShift Enterprise # system x86_64, linux-gnu # ui RStudio # language (EN) # collate en_CA.UTF-8 # ctype en_CA.UTF-8 # tz America/Toronto # date 2024-02-06 # rstudio 2023.06.0+421.pro1 Mountain Hydrangea (server) # pandoc 3.1.1 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/ (via rmarkdown) # # ─ Packages ─────────────────────────────────────────────────────────────────── # ! package * version date (UTC) lib source # P bit 4.0.5 2022-11-15 [?] RSPM (R 4.0.5) # P bit64 4.0.5 2020-08-30 [?] RSPM (R 4.0.5) # P blob 1.2.4 2023-03-17 [?] RSPM (R 4.0.5) # P cli 3.6.0 2023-01-09 [?] RSPM (R 4.0.5) # P DBI * 1.1.3 2022-06-18 [?] CRAN (R 4.0.5) # P dbplyr * 2.3.2 2023-03-21 [?] RSPM (R 4.0.5) # P digest 0.6.31 2022-12-11 [?] RSPM (R 4.0.5) # P dplyr * 1.1.0 2023-01-29 [?] RSPM (R 4.0.5) # P evaluate 0.20 2023-01-17 [?] RSPM (R 4.0.5) # P fansi 1.0.4 2023-01-22 [?] CRAN (R 4.0.2) # P fastmap 1.1.1 2023-02-24 [?] RSPM (R 4.0.5) # P generics 0.1.3 2022-07-05 [?] RSPM (R 4.0.5) # P glue 1.6.2 2022-02-24 [?] RSPM (R 4.0.5) # P hms 1.1.3 2023-03-21 [?] RSPM (R 4.0.5) # P htmltools 0.5.7 2023-11-03 [?] RSPM (R 4.0.5) # P knitr 1.42 2023-01-25 [?] RSPM (R 4.0.5) # P later 1.3.0 2021-08-18 [?] RSPM (R 4.0.5) # P lifecycle 1.0.3 2022-10-07 [?] RSPM (R 4.0.5) # magrittr 2.0.3 2022-03-30 [1] RSPM (R 4.0.5) # P odbc * 1.4.2 2024-01-22 [?] RSPM (R 4.0.5) # P pillar 1.8.1 2022-08-19 [?] RSPM (R 4.0.5) # P pkgconfig 2.0.3 2019-09-22 [?] RSPM (R 4.0.3) # P pool * 1.0.1 2023-02-21 [?] RSPM (R 4.0.5) # P purrr 1.0.1 2023-01-10 [?] RSPM (R 4.0.5) # R6 2.5.1 2021-08-19 [1] RSPM (R 4.0.5) # P Rcpp 1.0.10 2023-01-22 [?] CRAN (R 4.0.2) # renv 1.0.3 2023-09-19 [1] RSPM (R 4.0.2) # P rlang 1.1.0 2023-03-14 [?] RSPM (R 4.0.5) # P rmarkdown 2.20 2023-01-19 [?] RSPM (R 4.0.5) # P rstudioapi 0.15.0 2023-07-07 [?] RSPM (R 4.0.5) # P sessioninfo 1.2.2 2021-12-06 [?] RSPM (R 4.0.5) # P tibble 3.2.1 2023-03-20 [?] RSPM (R 4.0.5) # P tidyselect 1.2.0 2022-10-10 [?] RSPM (R 4.0.5) # P utf8 1.2.3 2023-01-31 [?] RSPM (R 4.0.5) # P vctrs 0.6.0 2023-03-16 [?] RSPM (R 4.0.5) # P withr 2.5.0 2022-03-03 [?] RSPM (R 4.0.5) # P xfun 0.37 2023-01-31 [?] RSPM (R 4.0.5) # P yaml 2.3.7 2023-01-23 [?] RSPM (R 4.0.5) # # [1] /XXXX/renv/library/R-4.0/x86_64-pc-linux-gnu # [2] /opt/R/4.0.2/lib/R/library # # P ── Loaded and on-disk path mismatch.

© www.soinside.com 2019 - 2024. All rights reserved.