我正在使用 Netezza SQL。
我有下表(称为“df” - 所有日期均为 DATE 格式,看起来像 2010-01-01 00:00:00):
student var1 start end
1 a 2010-01-01 2013-01-01
1 b 2010-01-01 2013-01-01
1 b 2013-05-05 2015-09-09
1 a 2017-10-10 2018-09-01
2 c 2010-01-01 2014-01-01
2 a 2015-01-01 2017-09-01
2 b 2019-01-01 2023-03-05
我的问题:
最终结果如下所示:
Student start_time end_time at_least_one_var1_a
1 1 2010-03-01 2011-03-01 TRUE
2 1 2011-03-01 2012-03-01 TRUE
3 1 2012-03-01 2013-03-01 TRUE
4 1 2013-03-01 2014-03-01 FALSE
5 1 2014-03-01 2015-03-01 FALSE
6 1 2015-03-01 2016-03-01 FALSE
7 1 2016-03-01 2017-03-01 FALSE
8 1 2017-03-01 2018-03-01 TRUE
9 1 2018-03-01 2019-03-01 TRUE
10 1 2019-03-01 2020-03-01 FALSE
11 2 2010-03-01 2011-03-01 FALSE
12 2 2011-03-01 2012-03-01 FALSE
13 2 2012-03-01 2013-03-01 FALSE
14 2 2013-03-01 2014-03-01 FALSE
15 2 2014-03-01 2015-03-01 TRUE
16 2 2015-03-01 2016-03-01 TRUE
17 2 2016-03-01 2017-03-01 TRUE
18 2 2017-03-01 2018-03-01 TRUE
19 2 2018-03-01 2019-03-01 FALSE
20 2 2019-03-01 2020-03-01 FALSE
我的问题:我不知道如何在 SQL 中执行此操作
我尝试使用以下逻辑:
第 1 步:创建包含所有日期范围的 CTE
WITH step1 AS (
SELECT '2010-03-01' AS start_date, '2011-03-01' AS end_date UNION ALL
SELECT '2011-03-01', '2012-03-01' UNION ALL
SELECT '2012-03-01', '2013-03-01' UNION ALL
SELECT '2013-03-01', '2014-03-01' UNION ALL
SELECT '2014-03-01', '2015-03-01' UNION ALL
SELECT '2015-03-01', '2016-03-01' UNION ALL
SELECT '2016-03-01', '2017-03-01' UNION ALL
SELECT '2017-03-01', '2018-03-01' UNION ALL
SELECT '2018-03-01', '2019-03-01' UNION ALL
SELECT '2019-03-01', '2020-03-01'
)
第 2 步:编写“三明治”样式连接以查看相对于 start_date 和 end_dates 的检查学生:
joined_data AS (
SELECT
t.student,
d.start_date,
d.end_date,
t.var1
FROM
df t
JOIN
date_ranges d
ON
t.start <= d.end_date AND t.end >= d.start_date
)
但是第 2 步返回空结果
如果步骤 2 有效,我就可以使用一系列 CASE WHEN 语句来执行其余的计数
var1_counts AS (
SELECT
student,
start_date,
end_date,
COUNT(CASE WHEN var1 = 'a' THEN 1 END) AS var1_count
FROM
joined_data
GROUP BY
student, start_date, end_date
)
SELECT
student,
start_date,
end_date,
CASE WHEN var1_count > 0 THEN 'Yes' ELSE 'No' END AS at_least_one_var1_a
FROM
var1_counts;
有人可以告诉我如何解决这个问题吗?请注意,Netezza 不支持交叉联接、相关查询或递归查询(我通常使用 JOIN 1=1 等技术来处理这些情况)。
谢谢!
除了更改列名称(END 是保留标识符)之外,SQL 在 Netezza 上运行良好。
nzsql <<eof
create table df (student integer, var1 varchar(10), start_date date, end_date date);
eof
nzload -t df -delim " " <<eof
1 a 2010-01-01 2013-01-01
1 b 2010-01-01 2013-01-01
1 b 2013-05-05 2015-09-09
1 a 2017-10-10 2018-09-01
2 c 2010-01-01 2014-01-01
2 a 2015-01-01 2017-09-01
2 b 2019-01-01 2023-03-05
eof
#########################################################
nzsql <<eof
WITH date_ranges AS (
SELECT '2010-03-01' AS start_date, '2011-03-01' AS end_date UNION ALL
SELECT '2011-03-01', '2012-03-01' UNION ALL
SELECT '2012-03-01', '2013-03-01' UNION ALL
SELECT '2013-03-01', '2014-03-01' UNION ALL
SELECT '2014-03-01', '2015-03-01' UNION ALL
SELECT '2015-03-01', '2016-03-01' UNION ALL
SELECT '2016-03-01', '2017-03-01' UNION ALL
SELECT '2017-03-01', '2018-03-01' UNION ALL
SELECT '2018-03-01', '2019-03-01' UNION ALL
SELECT '2019-03-01', '2020-03-01'
),
joined_data AS (
SELECT
t.student,
d.start_date,
d.end_date,
t.var1
FROM
df t
JOIN
date_ranges d
ON
t.start_DATE <= d.end_date AND t.end_DATE >= d.start_date
),
var1_counts AS (
SELECT
student,
start_date,
end_date,
COUNT(CASE WHEN var1 = 'a' THEN 1 END) AS var1_count
FROM
joined_data
GROUP BY
student, start_date, end_date
)
SELECT
student,
start_date,
end_date,
CASE WHEN var1_count > 0 THEN 'Yes' ELSE 'No' END AS at_least_one_var1_a
FROM
var1_counts;
eof