我正在使用 Netezza SQL。
我有下表:
name year var1 var2
John 2001 a b
John 2002 a a
John 2003 a b
Mary 2001 b a
Mary 2002 a b
Mary 2003 b a
Alice 2001 a b
Alice 2002 b a
Alice 2003 a b
Bob 2001 b a
Bob 2002 b b
Bob 2003 b a
我想回答以下问题:
我编写这段代码是为了查看每个人的 var1 和 var2 每年如何变化:
WITH CTE AS (
SELECT
name,
year,
var1,
var2,
LAG(var1, 1) OVER (PARTITION BY name ORDER BY year ASC) AS var1_before,
LEAD(var1, 1) OVER (PARTITION BY name ORDER BY year ASC) AS var1_after,
LAG(var2, 1) OVER (PARTITION BY name ORDER BY year ASC) AS var2_before,
LEAD(var2, 1) OVER (PARTITION BY name ORDER BY year ASC) AS var2_after,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY year ASC) AS row_num
FROM
mytable
)
SELECT
*
FROM
CTE;
但我不知道如何从这里开始。我试图识别有变化的名称与没有变化的名称......但我总是感到困惑和混乱。
有人可以告诉我如何做到这一点吗?
谢谢!
为了解决您的问题,我们需要使用额外的逻辑来扩展您已经创建的公共表表达式 (CTE),以检测每个名称的 var1 首次更改的时间,并处理 var1 在整个过程中没有更改的情况时期。这涉及在 SQL 中使用额外的窗口函数和条件逻辑。
一些注意事项:
代码:
WITH CTE AS (
SELECT
name,
year,
var1,
var2,
LAG(var1) OVER (PARTITION BY name ORDER BY year) AS var1_before,
LEAD(var1) OVER (PARTITION BY name ORDER BY year) AS var1_after,
LAG(var2) OVER (PARTITION BY name ORDER BY year) AS var2_before,
LEAD(var2) OVER (PARTITION BY name ORDER BY year) AS var2_after,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY year) AS row_num,
CASE
WHEN LAG(var1) OVER (PARTITION BY name ORDER BY year) IS NOT NULL AND
LAG(var1) OVER (PARTITION BY name ORDER BY year) != var1 THEN 1
ELSE 0
END AS var1_changed_flag
FROM
mytable
),
RankedChanges AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY name, var1_changed_flag ORDER BY year) AS change_rank
FROM
CTE
WHERE
var1_changed_flag = 1 OR
var1_after IS NULL -- This condition helps to include the last row for each name
)
SELECT
*
FROM
RankedChanges
WHERE
change_rank = 1 OR
(var1_after IS NULL AND var1_changed_flag = 0) -- Select the last row if var1 never changed
ORDER BY
name,
year;