AWS Athena 希望加入表一次而不是 3 次

问题描述 投票:0回答:1

您能否帮忙重写 AWS Athena 上的 SQL,以使用具有 1 次连接而不是 3 次连接的表 我需要得到结果:

with t1 as (
    select 1 id, 1 id1, 2 id2, 3 id3
    union all
    select 2 id, 4 id1, 2 id2, 4
    union all
    select 3 id, 4 id1, 4 id2, 1),
t2 as (
    select 1 id, 'Text1' txt
    union all
    select 2 id, 'Text2' txt
    union all
    select 3 id, 'Text3' txt)
select t1.*, 
  coalesce(t2.id,t3.id,t4.id) t2_id,
  coalesce(t2.txt,t3.txt,t4.txt) t2_txt
from t1
    left join t2 on t1.id1 = t2.id
    left join t2 t3 on t1.id2 = t3.id and t2.id is null
    left join t2 t4 on t1.id3 = t4.id and t2.id is null and t3.id is null

所需结果:

Result

我已经尝试过:

with t1 as (
    select 1 id, 1 id1, 2 id2, 3 id3
    union all
    select 2 id, 4 id1, 2 id2, 4
    union all
    select 3 id, 4 id1, 4 id2, 1),
t2 as (
    select 1 id, 'Text1' txt
    union all
    select 2 id, 'Text2' txt
    union all
    select 3 id, 'Text3' txt)
select t1.*, 
   t2.id  t2_id,
   t2.txt t2_txt
from t1
    left join t2 on 
      case 
        when t1.id1 = t2.id then t2.id -- First condition (1)
        when t1.id2 = t2.id then t2.id -- Should be skiped if 1 is true (2)
        when t1.id3 = t2.id then t2.id -- Should be skiped if 1 or 2 is true (3)
      end = t2.id
order by t1.id,t2.id

请您指教一下。 预先感谢您!!!

sql amazon-web-services amazon-athena presto trino
1个回答
0
投票

不确定这对性能是否有帮助,但您可以尝试在连接条件中使用

or
,然后按 id 分组(或者您可以使用
row_number
引入代理唯一 id),然后使用
max_by
来选择项目与“最早”的 id 匹配:

-- sample data
with t1(id, id1, id2, id3) as (
    values (1, 1, 2, 3),
     (2, 4, 2, 4),
     (3, 4, 4, 1)),
t2 (id, txt) as (
    values (1, 'Text1'),
    (2, 'Text2'),
    (3, 'Text3'))

-- query
select id, id1, id2, id3,
       max_by(t2_id, id_matched) t2_id,
       max_by(t2_txt, id_matched) t2_txt
from(
    select t1.*,
      t2.id t2_id,
      t2.txt t2_txt,
      case t2.id
        when t1.id1 then 3
        when t1.id2 then 2
        when t1.id3 then 1
      end id_matched -- surrogate order based on "first" matched id
    from t1
        left join t2 on t1.id1 = t2.id
        or (t1.id2 = t2.id and t1.id1 != t2.id)
        or (t1.id3 = t2.id and t1.id1 != t2.id and t1.id2 != t2.id)
    where t2.id is not null
    )
group by id, id1, id2, id3;

输出:

id id1 id2 id3 t2_id t2_txt
3 4 4 1 1 文字1
1 1 2 3 1 文字1
2 4 2 4 2 文字2
© www.soinside.com 2019 - 2024. All rights reserved.