我想将临时表合并到主表中,临时表看起来像这样
name | id | keys_id | event_name | event_time
a | 1 | 1 | INSERT | 1
b | 1 | 1 | MODIFY | 1
| | 1 | REMOVE | 1
我将此合并到查询中:
MERGE INTO new_db_test.test_iceberg_table AS target USING new_db_test.staging_table
AS source
ON target.id = source.keys_id
WHEN MATCHED AND source.event_name = 'MODIFY'
THEN UPDATE SET name = source.name, id = source.id
WHEN MATCHED AND source.event_name = 'REMOVE'
THEN DELETE
WHEN NOT MATCHED AND source.event_name = 'INSERT'
THEN INSERT (id, name)
VALUES (source.id, source.name)
test_iceberg_table
应为空,但未应用删除事件,并且表中仍有具有更新值的行:
name | id
b | 1
当我删除更新条件时,它会很好地删除记录。这里可能出了什么问题?我也很感激一些资源,我可以在其中学习 icerberg 如何应用 merge into。
每个目标行仅受影响一次,如果多个源行匹配,则将选择单个任意源行。 (在某些 DBMS 中,发生这种情况时会出现错误。)
因此,您需要预先聚合您的来源,并以
REMOVE
优先的方式对其进行过滤。
WITH source AS (
SELECT s.*
FROM (
SELECT s.*,
ROW_NUMBER() OVER (PARTITION BY s.keys_id ORDER BY s.event_name DESC) AS rn
FROM new_db_test.staging_table AS s
) s
WHERE s.rn = 1
)
MERGE INTO new_db_test.test_iceberg_table AS target
USING source
ON target.id = source.keys_id
WHEN MATCHED AND source.event_name = 'MODIFY'
THEN UPDATE SET
name = source.name,
id = source.id
WHEN MATCHED AND source.event_name = 'REMOVE'
THEN DELETE
WHEN NOT MATCHED AND source.event_name = 'INSERT'
THEN INSERT
(id, name)
VALUES
(source.id, source.name);