Trino/Presto SQ:仅当 NULL 出现在组中第一个非 NULL 值之后时,才将 NULL 替换为值

问题描述 投票:0回答:1

我想用指定的字符串替换

NULL
值。但是,我只想对
NULL
after first
NULL
的值进行此替换。意思是,如果
NULL
值是 before 第一个非
NULL
,则保留它
NULL

以这个数据为例:

# | user_id | some_date  | animal  |
# |---------|------------|---------|
# | 1       | 2022-01-01 | NULL    | <~~ keep as NULL
# | 1       | 2022-01-02 | zebra   | <~~ 'zebra' is the first non-NULL value for user_id = 1
# | 1       | 2022-01-03 | lion    |
# | 1       | 2022-01-04 | NULL    | <~~ replace NULL with 'no_animal'
# | 1       | 2022-01-05 | cat     |
# | 2       | 2023-10-05 | NULL    | <~~ keep as NULL
# | 2       | 2023-10-06 | NULL    | <~~ keep as NULL
# | 2       | 2023-10-07 | dog     | <~~ 'dog' is the first non-NULL value for user_id = 2
# | 2       | 2023-10-08 | frog    |
# | 2       | 2023-10-09 | NULL    | <~~ replace NULL with 'no_animal'
# | 3       | 2024-02-03 | hamster | <~~ 'hamster' is the first non-NULL value for user_id = 3
# | 3       | 2024-02-04 | rabbit  |
# | 3       | 2024-02-05 | NULL    | <~~ replace NULL with 'no_animal'
# | 3       | 2024-02-06 | NULL    | <~~ replace NULL with 'no_animal'

期望的输出应该是:

# | user_id | some_date  | animal  | replaced_null |
# |---------|------------|---------|---------------|
# | 1       | 2022-01-01 | NULL    | NULL          |
# | 1       | 2022-01-02 | zebra   | zebra         |
# | 1       | 2022-01-03 | lion    | lion          |
# | 1       | 2022-01-04 | NULL    | no_animal     |
# | 1       | 2022-01-05 | cat     | cat           |
# | 2       | 2023-10-05 | NULL    | NULL          |
# | 2       | 2023-10-06 | NULL    | NULL          |
# | 2       | 2023-10-07 | dog     | dog           |
# | 2       | 2023-10-08 | frog    | frog          |
# | 2       | 2023-10-09 | NULL    | no_animal     |
# | 3       | 2024-02-03 | hamster | hamster       |
# | 3       | 2024-02-04 | rabbit  | rabbit        |
# | 3       | 2024-02-05 | NULL    | no_animal     |
# | 3       | 2024-02-06 | NULL    | no_animal     |

SQL 方言

我使用在 Trino SQL 上运行的 AWS Athena。

可重复的数据

WITH my_tbl AS (
    SELECT *
    FROM (VALUES
        (1, DATE '2022-01-01', NULL),
        (1, DATE '2022-01-02', 'zebra'),
        (1, DATE '2022-01-03', 'lion'),
        (1, DATE '2022-01-04', NULL),
        (1, DATE '2022-01-05', 'cat'),
        (2, DATE '2023-10-05', NULL),
        (2, DATE '2023-10-06', NULL),
        (2, DATE '2023-10-07', 'dog'),
        (2, DATE '2023-10-08', 'frog'),
        (2, DATE '2023-10-09', NULL),
        (3, DATE '2024-02-03', 'hamster'),
        (3, DATE '2024-02-04', 'rabbit'),
        (3, DATE '2024-02-05', NULL),
        (3, DATE '2024-02-06', NULL)
    ) AS t(user_id, some_date, animal)
)
sql window-functions amazon-athena presto trino
1个回答
0
投票

您可以尝试将

coalesce
与条件
lag
一起使用,忽略空值:

SELECT user_id,
    some_date,
    COALESCE(animal, if(LAG(animal) ignore nulls over (PARTITION by user_id order by some_date) is not null, 'no_animal'))
FROM my_tbl
ORDER by user_id, some_date;
© www.soinside.com 2019 - 2024. All rights reserved.