AWS Athena 希望加入表一次而不是 3 次

Question

您能否帮忙重写 AWS Athena 上的 SQL，以使用具有 1 次连接而不是 3 次连接的表我需要得到结果：

with t1 as (
    select 1 id, 1 id1, 2 id2, 3 id3
    union all
    select 2 id, 4 id1, 2 id2, 4
    union all
    select 3 id, 4 id1, 4 id2, 1),
t2 as (
    select 1 id, 'Text1' txt
    union all
    select 2 id, 'Text2' txt
    union all
    select 3 id, 'Text3' txt)
select t1.*, 
  coalesce(t2.id,t3.id,t4.id) t2_id,
  coalesce(t2.txt,t3.txt,t4.txt) t2_txt
from t1
    left join t2 on t1.id1 = t2.id
    left join t2 t3 on t1.id2 = t3.id and t2.id is null
    left join t2 t4 on t1.id3 = t4.id and t2.id is null and t3.id is null

所需结果：

我已经尝试过：

with t1 as (
    select 1 id, 1 id1, 2 id2, 3 id3
    union all
    select 2 id, 4 id1, 2 id2, 4
    union all
    select 3 id, 4 id1, 4 id2, 1),
t2 as (
    select 1 id, 'Text1' txt
    union all
    select 2 id, 'Text2' txt
    union all
    select 3 id, 'Text3' txt)
select t1.*, 
   t2.id  t2_id,
   t2.txt t2_txt
from t1
    left join t2 on 
      case 
        when t1.id1 = t2.id then t2.id -- First condition (1)
        when t1.id2 = t2.id then t2.id -- Should be skiped if 1 is true (2)
        when t1.id3 = t2.id then t2.id -- Should be skiped if 1 or 2 is true (3)
      end = t2.id
order by t1.id,t2.id

请您指教一下。预先感谢您！！！

Answer 1

不确定这对性能是否有帮助，但您可以尝试在连接条件中使用

or

，然后按 id 分组（或者您可以使用

row_number

引入代理唯一 id），然后使用

max_by

来选择项目与“最早”的 id 匹配：

-- sample data
with t1(id, id1, id2, id3) as (
    values (1, 1, 2, 3),
     (2, 4, 2, 4),
     (3, 4, 4, 1)),
t2 (id, txt) as (
    values (1, 'Text1'),
    (2, 'Text2'),
    (3, 'Text3'))

-- query
select id, id1, id2, id3,
       max_by(t2_id, id_matched) t2_id,
       max_by(t2_txt, id_matched) t2_txt
from(
    select t1.*,
      t2.id t2_id,
      t2.txt t2_txt,
      case t2.id
        when t1.id1 then 3
        when t1.id2 then 2
        when t1.id3 then 1
      end id_matched -- surrogate order based on "first" matched id
    from t1
        left join t2 on t1.id1 = t2.id
        or (t1.id2 = t2.id and t1.id1 != t2.id)
        or (t1.id3 = t2.id and t1.id1 != t2.id and t1.id2 != t2.id)
    where t2.id is not null
    )
group by id, id1, id2, id3;

输出：

id	id1	id2	id3	t2_id	t2_txt
3	4	4	1	1	文字1
1	1	2	3	1	文字1
2	4	2	4	2	文字2

AWS Athena 希望加入表一次而不是 3 次

问题描述投票：0回答：1

1个回答

最新问题

AWS Athena 希望加入表一次而不是 3 次

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1