我想用指定的字符串替换
NULL
值。但是,我只想对 NULL
after first 非NULL
的值进行此替换。意思是,如果 NULL
值是 before 第一个非 NULL
,则保留它 NULL
。
以这个数据为例:
# | user_id | some_date | animal |
# |---------|------------|---------|
# | 1 | 2022-01-01 | NULL | <~~ keep as NULL
# | 1 | 2022-01-02 | zebra | <~~ 'zebra' is the first non-NULL value for user_id = 1
# | 1 | 2022-01-03 | lion |
# | 1 | 2022-01-04 | NULL | <~~ replace NULL with 'no_animal'
# | 1 | 2022-01-05 | cat |
# | 2 | 2023-10-05 | NULL | <~~ keep as NULL
# | 2 | 2023-10-06 | NULL | <~~ keep as NULL
# | 2 | 2023-10-07 | dog | <~~ 'dog' is the first non-NULL value for user_id = 2
# | 2 | 2023-10-08 | frog |
# | 2 | 2023-10-09 | NULL | <~~ replace NULL with 'no_animal'
# | 3 | 2024-02-03 | hamster | <~~ 'hamster' is the first non-NULL value for user_id = 3
# | 3 | 2024-02-04 | rabbit |
# | 3 | 2024-02-05 | NULL | <~~ replace NULL with 'no_animal'
# | 3 | 2024-02-06 | NULL | <~~ replace NULL with 'no_animal'
期望的输出应该是:
# | user_id | some_date | animal | replaced_null |
# |---------|------------|---------|---------------|
# | 1 | 2022-01-01 | NULL | NULL |
# | 1 | 2022-01-02 | zebra | zebra |
# | 1 | 2022-01-03 | lion | lion |
# | 1 | 2022-01-04 | NULL | no_animal |
# | 1 | 2022-01-05 | cat | cat |
# | 2 | 2023-10-05 | NULL | NULL |
# | 2 | 2023-10-06 | NULL | NULL |
# | 2 | 2023-10-07 | dog | dog |
# | 2 | 2023-10-08 | frog | frog |
# | 2 | 2023-10-09 | NULL | no_animal |
# | 3 | 2024-02-03 | hamster | hamster |
# | 3 | 2024-02-04 | rabbit | rabbit |
# | 3 | 2024-02-05 | NULL | no_animal |
# | 3 | 2024-02-06 | NULL | no_animal |
我使用在 Trino SQL 上运行的 AWS Athena。
WITH my_tbl AS (
SELECT *
FROM (VALUES
(1, DATE '2022-01-01', NULL),
(1, DATE '2022-01-02', 'zebra'),
(1, DATE '2022-01-03', 'lion'),
(1, DATE '2022-01-04', NULL),
(1, DATE '2022-01-05', 'cat'),
(2, DATE '2023-10-05', NULL),
(2, DATE '2023-10-06', NULL),
(2, DATE '2023-10-07', 'dog'),
(2, DATE '2023-10-08', 'frog'),
(2, DATE '2023-10-09', NULL),
(3, DATE '2024-02-03', 'hamster'),
(3, DATE '2024-02-04', 'rabbit'),
(3, DATE '2024-02-05', NULL),
(3, DATE '2024-02-06', NULL)
) AS t(user_id, some_date, animal)
)
您可以尝试将
coalesce
与条件 lag
一起使用,忽略空值:
SELECT user_id,
some_date,
COALESCE(animal, if(LAG(animal) ignore nulls over (PARTITION by user_id order by some_date) is not null, 'no_animal'))
FROM my_tbl
ORDER by user_id, some_date;