我正在使用 Netezza SQL。
我有一个名为
df
的表格,所有日期都是 DATE 格式,看起来像 2010-01-01 00:00:00
:
学生 | var1 | 开始 | 结束 |
---|---|---|---|
1 | a | 2010-01-01 | 2013-01-01 |
1 | b | 2010-01-01 | 2013-01-01 |
1 | b | 2013-05-05 | 2015-09-09 |
1 | a | 2017-10-10 | 2018-09-01 |
2 | c | 2010-01-01 | 2014-01-01 |
2 | a | 2015-01-01 | 2017-09-01 |
2 | b | 2019-01-01 | 2023-03-05 |
我的问题:
最终结果如下所示:
SN | 学生 | 开始时间 | 结束时间 | at_least_one_var1_a |
---|---|---|---|---|
1 | 1 | 2010-03-01 | 2011-03-01 | 正确 |
2 | 1 | 2011-03-01 | 2012-03-01 | 正确 |
3 | 1 | 2012-03-01 | 2013-03-01 | 正确 |
4 | 1 | 2013-03-01 | 2014-03-01 | 错误 |
5 | 1 | 2014-03-01 | 2015-03-01 | 错误 |
6 | 1 | 2015-03-01 | 2016-03-01 | 错误 |
7 | 1 | 2016-03-01 | 2017-03-01 | 错误 |
8 | 1 | 2017-03-01 | 2018-03-01 | 正确 |
9 | 1 | 2018-03-01 | 2019-03-01 | 正确 |
10 | 1 | 2019-03-01 | 2020-03-01 | 错误 |
11 | 2 | 2010-03-01 | 2011-03-01 | 错误 |
12 | 2 | 2011-03-01 | 2012-03-01 | 错误 |
13 | 2 | 2012-03-01 | 2013-03-01 | 错误 |
14 | 2 | 2013-03-01 | 2014-03-01 | 错误 |
15 | 2 | 2014-03-01 | 2015-03-01 | 正确 |
16 | 2 | 2015-03-01 | 2016-03-01 | 正确 |
17 | 2 | 2016-03-01 | 2017-03-01 | 正确 |
18 | 2 | 2017-03-01 | 2018-03-01 | 正确 |
19 | 2 | 2018-03-01 | 2019-03-01 | 错误 |
20 | 2 | 2019-03-01 | 2020-03-01 | 错误 |
我的问题:我不知道如何在 SQL 中执行此操作
我尝试使用以下逻辑:
第 1 步:创建包含所有日期范围的 CTE
WITH step1 AS (
SELECT '2010-03-01' AS start_date, '2011-03-01' AS end_date UNION ALL
SELECT '2011-03-01', '2012-03-01' UNION ALL
SELECT '2012-03-01', '2013-03-01' UNION ALL
SELECT '2013-03-01', '2014-03-01' UNION ALL
SELECT '2014-03-01', '2015-03-01' UNION ALL
SELECT '2015-03-01', '2016-03-01' UNION ALL
SELECT '2016-03-01', '2017-03-01' UNION ALL
SELECT '2017-03-01', '2018-03-01' UNION ALL
SELECT '2018-03-01', '2019-03-01' UNION ALL
SELECT '2019-03-01', '2020-03-01'
)
第 2 步:编写“三明治”样式连接以查看相对于 start_date 和 end_dates 的检查学生:
joined_data AS (
SELECT
t.student,
d.start_date,
d.end_date,
t.var1
FROM
df t
JOIN
date_ranges d
ON
t.start <= d.end_date AND t.end >= d.start_date
)
但是第 2 步返回空结果
如果步骤 2 有效,我就可以使用一系列 CASE WHEN 语句来执行其余的计数
var1_counts AS (
SELECT
student,
start_date,
end_date,
COUNT(CASE WHEN var1 = 'a' THEN 1 END) AS var1_count
FROM
joined_data
GROUP BY
student, start_date, end_date
)
SELECT
student,
start_date,
end_date,
CASE WHEN var1_count > 0 THEN 'Yes' ELSE 'No' END AS at_least_one_var1_a
FROM
var1_counts;
有人可以告诉我如何解决这个问题吗?请注意,Netezza 不支持交叉联接、相关查询或递归查询(我通常使用 JOIN 1=1 等技术来处理这些情况)。
注意: 目前我正在研究使用 TO_CHAR()、TO_DATE()、CAST() 和 DATE_PART() 和 DATE_TRUNC() 函数,看看这是否有用?
除了更改列名称(END 是保留标识符)之外,SQL 在 Netezza 上运行良好。
nzsql <<eof
create table df (student integer, var1 varchar(10), start_date date, end_date date);
eof
nzload -t df -delim " " <<eof
1 a 2010-01-01 2013-01-01
1 b 2010-01-01 2013-01-01
1 b 2013-05-05 2015-09-09
1 a 2017-10-10 2018-09-01
2 c 2010-01-01 2014-01-01
2 a 2015-01-01 2017-09-01
2 b 2019-01-01 2023-03-05
eof
#########################################################
nzsql <<eof
WITH date_ranges AS (
SELECT '2010-03-01' AS start_date, '2011-03-01' AS end_date UNION ALL
SELECT '2011-03-01', '2012-03-01' UNION ALL
SELECT '2012-03-01', '2013-03-01' UNION ALL
SELECT '2013-03-01', '2014-03-01' UNION ALL
SELECT '2014-03-01', '2015-03-01' UNION ALL
SELECT '2015-03-01', '2016-03-01' UNION ALL
SELECT '2016-03-01', '2017-03-01' UNION ALL
SELECT '2017-03-01', '2018-03-01' UNION ALL
SELECT '2018-03-01', '2019-03-01' UNION ALL
SELECT '2019-03-01', '2020-03-01'
),
joined_data AS (
SELECT
t.student,
d.start_date,
d.end_date,
t.var1
FROM
df t
JOIN
date_ranges d
ON
t.start_DATE <= d.end_date AND t.end_DATE >= d.start_date
),
var1_counts AS (
SELECT
student,
start_date,
end_date,
COUNT(CASE WHEN var1 = 'a' THEN 1 END) AS var1_count
FROM
joined_data
GROUP BY
student, start_date, end_date
)
SELECT
student,
start_date,
end_date,
CASE WHEN var1_count > 0 THEN 'Yes' ELSE 'No' END AS at_least_one_var1_a
FROM
var1_counts;
eof
先解释一下:
cross join
和 distinct
exists
而不是左连接,因为...查询:
with yearlist as (
select '2010-03-01' as start_date, '2011-03-01' as end_date union all
select '2011-03-01', '2012-03-01' union all
select '2012-03-01', '2013-03-01' union all
select '2013-03-01', '2014-03-01' union all
select '2014-03-01', '2015-03-01' union all
select '2015-03-01', '2016-03-01' union all
select '2016-03-01', '2017-03-01' union all
select '2017-03-01', '2018-03-01' union all
select '2018-03-01', '2019-03-01' union all
select '2019-03-01', '2020-03-01'
), studentlist as (
select distinct student
from df
)
select
studentlist.student,
yearlist.start_date,
yearlist.end_date,
case when exists (
select *
from df
where df.var1 = 'a'
and df.student = studentlist.student
and df.end > yearlist.start_date
and df.start < yearlist.end_date
) then 'true' else 'false' end as has_a
from yearlist
cross join studentlist
order by studentlist.student, yearlist.start_date
使用您的原始代码,我们不会得到您的“预期结果”(只有 18 行而不是 20 行......几年都不会出现)。
这是重写的
WITH date_ranges AS (
select ('2010-03-01'::date + interval '1 year' * idx)::date as start_date,
(start_date + interval '1 year')::date as end_date
from _v_vector_idx where start_date between '2010-03-01' and '2019-03-01'
)
SELECT
t.student,
d.start_date,
d.end_date,
NVL2(max(case when t.start_DATE <= d.end_date AND t.end_DATE >= d.start_date and t.var1 = 'a' then 1 else null end),'TRUE','FALSE')
FROM
df t
CROSS JOIN date_ranges d
group by 1,2,3
order by 1,2,3;