在同一查询中组合内连接和语句之间

问题描述 投票:0回答:1

我有这两张表:

CREATE TABLE table_a (
  name VARCHAR(255),
  date_1 DATE
);


INSERT INTO table_a (name, date_1) VALUES
('john', '2010-01-01'),
('john', '2012-02-01'),
('john', '2017-08-01'),
('sara', '2008-04-01'),
('sara', '2011-04-01'),
('tim', '2000-01-01'),
('tim', '2001-01-01'),
('alex', '2013-01-01');


CREATE TABLE table_b (
  name VARCHAR(255),
  date_2 DATE,
  date_3 DATE,
  var CHAR(1)
);


INSERT INTO table_b (name, date_2, date_3, var) VALUES
('john', '2001-01-01', '2015-01-01', 'b'),
('sara', '2000-01-01', '2015-01-01', 'c'),
('sara', '2015-01-02', '2022-01-01', 'a'),
('tim', '2020-01-01', '2021-01-01', 'a'),
('john', '1998-01-01', '1999-01-01', 'd');

它们看起来像这样:

#table_a
      name     date_1
 john 2010-01-01
 john 2012-02-01
 john 2017-08-01
 sara 2008-04-01
 sara 2011-04-01
  tim 2000-01-01
  tim 2001-01-01
 alex 2013-01-01

 #table_b
  name     date_2     date_3 var
 john 2001-01-01 2015-01-01   b
 sara 2000-01-01 2015-01-01   c
 sara 2015-01-02 2022-01-01   a
  tim 2020-01-01 2021-01-01   a
 john 1998-01-01 1999-01-01   d

这就是我想要实现的目标:

第 1 部分:精确连接

  • 对于 table_a 和 table_b 中的行(基于名称) - 查看 date_1 是否位于 table_b 中的一对 (date_2, date_3) 之间。如果是的话就加入吧

第 2 部分:窗口连接

  • 对于第 1 部分中未加入的行(即可能包含第 1 部分中分析的名称),查看这些名称是否出现在 table_b 中
  • 如果是,则查看名称(table_a)在 table_b 中是否有一行其 date_2 出现在 date_1 之前
  • 如果是,则加入最接近最早日期_1的行
  • 否则不予加入

这是我正在使用的代码:

# random ID approach : faster

WITH exact_join AS (
  SELECT a.*, b.var, random() as random_id
  FROM table_a a
  LEFT JOIN table_b b ON a.name = b.name AND a.date_1 BETWEEN b.date_2 AND b.date_3
),
window_join AS (
  SELECT a.*, b.var
  FROM table_a a
  LEFT JOIN (
    SELECT name, var, date_2, ROW_NUMBER() OVER (PARTITION BY name ORDER BY date_2 DESC) as rn
    FROM table_b
  ) b ON a.name = b.name AND a.date_1 > b.date_2
  WHERE b.rn = 1 AND a.random_id NOT IN (SELECT random_id FROM exact_join)
)
SELECT * FROM exact_join
UNION ALL
SELECT * FROM window_join;

# non-random id approach (slower)

WITH exact_join AS (
  SELECT a.*, b.var, ROW_NUMBER() OVER (ORDER BY 1) as id
  FROM table_a a
  LEFT JOIN table_b b ON a.name = b.name AND a.date_1 BETWEEN b.date_2 AND b.date_3
),
window_join AS (
  SELECT a.*, b.var
  FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY 1) as id
    FROM table_a
  ) a
  LEFT JOIN (
    SELECT name, var, date_2, ROW_NUMBER() OVER (PARTITION BY name ORDER BY date_2 DESC) as rn
    FROM table_b
  ) b ON a.name = b.name AND a.date_1 > b.date_2
  WHERE b.rn = 1 AND a.id NOT IN (SELECT id FROM exact_join)
)
SELECT * FROM exact_join
UNION ALL
SELECT * FROM window_join;

我认为代码给出了正确的输出:

  name     date_1  var id
 john 2010-01-01    b  1
 john 2012-02-01    b  2
 john 2017-08-01 <NA>  3
 sara 2008-04-01    c  4
 sara 2011-04-01    c  5
  tim 2000-01-01 <NA>  6
  tim 2001-01-01 <NA>  7
 alex 2013-01-01 <NA>  8

这是防止行被分析两次或跳过的

random()
函数的正确使用吗?

db2
1个回答
0
投票

根据规则,我认为您显示的结果是不正确的:

如果是,则查看名称(table_a)在 table_b 中是否有一行其 date_2 出现在 date_1 之前

在这种情况下,John 第三次出现应该具有匹配值而不是空值。但哪一个呢?

b
还是
d
?我根据日期选择了最新的(
b
)。

你可以这样做:

select a.name, a.date_1, b.var
from table_a a
left join lateral (
  select * from table_b b where b.name = a.name and b.date_2 < a.date_1
  order by b.date_2 desc fetch next 1 rows only
) b on 1 = 1

结果:

 NAME  DATE_1      VAR  
 ----- ----------- ---- 
 john  2010-01-01  b    
 john  2012-02-01  b    
 john  2017-08-01  b    
 sara  2008-04-01  c    
 sara  2011-04-01  c    
 tim   2000-01-01  null 
 tim   2001-01-01  null 
 alex  2013-01-01  null 

请参阅 db<>fiddle 处的运行示例。

© www.soinside.com 2019 - 2024. All rights reserved.