直接在 SQLDF 中创建表

问题描述 投票:0回答:1

这是我原来的 R/sqldf 代码:

首先我创建数据:

table_a <- data.frame(name = c('john', 'john', 'john', 'alex', 'alex', 'tim', 'tim', 'joe', 'joe', 'jessica', 'jessica'),
                      year = c(2010, 2011, 2012, 2020, 2021, 2015, 2016, 2010, 2011, 2000, 2001),
                      var = c('a', 'a', 'c', 'b', 'c', NA, NA, NA, NA, NA, NA))

table_b <- data.frame(name = c('sara', 'sara', 'tim', 'tim', 'tim', 'jessica'),
                      year = c(2001, 2002, 2005, 2006, 2021, 2020),
                      var = c('a', 'b', 'c', 'd', 'f', 'z'))

接下来,我运行代码:

library(sqldf)

sqldf("WITH min_year AS (
          SELECT name
                 , MIN(year) AS min_year
          FROM table_a
          GROUP BY name
      )

, b_filtered AS (
            SELECT b.name
                   , MAX(b.year) AS max_year
                   , b.var      
            FROM table_b AS b
              INNER JOIN min_year AS m
                ON b.name = m.name 
                   AND b.year < m.min_year
            GROUP BY b.name
      )

SELECT a.name
       , a.year 
       , CASE WHEN a.var IS NULL AND b.name IS NOT NULL THEN b.var
          ELSE a.var
          END AS var_mod
FROM table_a AS a
  LEFT JOIN b_filtered b
    ON a.name = b.name")

是否可以将数据创建步骤和sql合并到同一段代码中?例如:

sqldf("WITH 
  table_a (name, year, var) AS 
  (
    VALUES
      ('john',    2010, 'a' )
    , ('john',    2011, 'a' )
    , ('john',    2012, 'c' )
    , ('alex',    2020, 'b' )
    , ('alex',    2021, 'c' )
    , ('tim',     2015, NULL)
    , ('tim',     2016, NULL)
    , ('joe',     2010, NULL)
    , ('joe',     2011, NULL)
    , ('jessica', 2000, NULL)
    , ('jessica', 2001, NULL)
  )
, table_b (name, year, var) AS
  (
    VALUES
      ('sara',    2001, 'a')
    , ('sara',    2002, 'b')
    , ('tim',     2005, 'c')
    , ('tim',     2006, 'd')
    , ('tim',     2021, 'f')
    , ('jessica', 2020, 'z')
  )
WITH min_year AS (
              SELECT name
                     , MIN(year) AS min_year
              FROM table_a
              GROUP BY name
          )
    
    , b_filtered AS (
                SELECT b.name
                       , MAX(b.year) AS max_year
                       , b.var      
                FROM table_b AS b
                  INNER JOIN min_year AS m
                    ON b.name = m.name 
                       AND b.year < m.min_year
                GROUP BY b.name
          )
    
    SELECT a.name
           , a.year 
           , CASE WHEN a.var IS NULL AND b.name IS NOT NULL THEN b.var
              ELSE a.var
              END AS var_mod
    FROM table_a AS a
      LEFT JOIN b_filtered b
        ON a.name = b.name")

虽然在 sqldf 语句之外创建数据然后运行 sqldf 代码工作得很好,但我只是想知道是否可以将它们组合成一段代码。

这使得测试和调试程序变得更加容易。

可以吗

r sqldf
1个回答
0
投票

问题是问题中的最后一个

sqldf
语句有语法错误。用逗号替换该语句中的第二个
with

© www.soinside.com 2019 - 2024. All rights reserved.