Athena/iceberg MERGE INTO 未在同一键上同时应用 DELETE 和 UPDATE

问题描述 投票:0回答:1

我想将临时表合并到主表中,临时表看起来像这样

name | id | keys_id | event_name | event_time
a    | 1  | 1       | INSERT     | 1
b    | 1  | 1       | MODIFY     | 1
     |    | 1       | REMOVE     | 1

我将此合并到查询中:

MERGE INTO new_db_test.test_iceberg_table AS target USING new_db_test.staging_table
AS source
ON target.id = source.keys_id
WHEN MATCHED AND source.event_name = 'MODIFY' 
THEN UPDATE SET name = source.name, id = source.id
WHEN MATCHED AND source.event_name = 'REMOVE'
THEN DELETE
WHEN NOT MATCHED AND source.event_name = 'INSERT' 
THEN INSERT (id, name) 
VALUES (source.id, source.name)

test_iceberg_table
应为空,但未应用删除事件,并且表中仍有具有更新值的行:

name | id 
b    | 1  

当我删除更新条件时,它会很好地删除记录。这里可能出了什么问题?我也很感激一些资源,我可以在其中学习 icerberg 如何应用 merge into。

sql apache-spark-sql amazon-athena apache-iceberg
1个回答
0
投票

每个目标行仅受影响一次,如果多个源行匹配,则将选择单个任意源行。 (在某些 DBMS 中,发生这种情况时会出现错误。)

因此,您需要预先聚合您的来源,并以

REMOVE
优先的方式对其进行过滤。

WITH source AS (
    SELECT s.*
    FROM (
        SELECT s.*,
          ROW_NUMBER() OVER (PARTITION BY s.keys_id ORDER BY s.event_name DESC) AS rn
        FROM new_db_test.staging_table AS s
    ) s
    WHERE s.rn = 1
)
MERGE INTO new_db_test.test_iceberg_table AS target
USING source
ON target.id = source.keys_id
WHEN MATCHED AND source.event_name = 'MODIFY' 
  THEN UPDATE SET
    name = source.name,
    id = source.id
WHEN MATCHED AND source.event_name = 'REMOVE'
  THEN DELETE
WHEN NOT MATCHED AND source.event_name = 'INSERT' 
  THEN INSERT
    (id, name) 
  VALUES
    (source.id, source.name);
© www.soinside.com 2019 - 2024. All rights reserved.