SQL:查找学生在给定年份范围内是否满足条件

问题描述 投票:0回答:1

我正在使用 Netezza SQL。

我有下表(称为“df” - 所有日期均为 DATE 格式,看起来像 2010-01-01 00:00:00):

  student var1      start        end
       1    a 2010-01-01 2013-01-01
       1    b 2010-01-01 2013-01-01
       1    b 2013-05-05 2015-09-09
       1    a 2017-10-10 2018-09-01
       2    c 2010-01-01 2014-01-01
       2    a 2015-01-01 2017-09-01
       2    b 2019-01-01 2023-03-05

我的问题:

  • 考虑从 2010 年到 2020 年。
  • 对于每个学生,在每年的3月1日到3月1日之间,我想找出该学生是否至少有一个var1 =a的值。
  • 如果是,则为 TRUE,否则为 FALSE(注意:当学生在给定年份中缺少信息时,仍为 FALSE)

最终结果如下所示:

  Student start_time   end_time at_least_one_var1_a
1        1 2010-03-01 2011-03-01                TRUE
2        1 2011-03-01 2012-03-01                TRUE
3        1 2012-03-01 2013-03-01                TRUE
4        1 2013-03-01 2014-03-01               FALSE
5        1 2014-03-01 2015-03-01               FALSE
6        1 2015-03-01 2016-03-01               FALSE
7        1 2016-03-01 2017-03-01               FALSE
8        1 2017-03-01 2018-03-01                TRUE
9        1 2018-03-01 2019-03-01                TRUE
10       1 2019-03-01 2020-03-01               FALSE
11       2 2010-03-01 2011-03-01               FALSE
12       2 2011-03-01 2012-03-01               FALSE
13       2 2012-03-01 2013-03-01               FALSE
14       2 2013-03-01 2014-03-01               FALSE
15       2 2014-03-01 2015-03-01                TRUE
16       2 2015-03-01 2016-03-01                TRUE
17       2 2016-03-01 2017-03-01                TRUE
18       2 2017-03-01 2018-03-01                TRUE
19       2 2018-03-01 2019-03-01               FALSE
20       2 2019-03-01 2020-03-01               FALSE

我的问题:我不知道如何在 SQL 中执行此操作

我尝试使用以下逻辑:

第 1 步:创建包含所有日期范围的 CTE

WITH step1 AS (
  SELECT '2010-03-01' AS start_date, '2011-03-01' AS end_date UNION ALL
  SELECT '2011-03-01', '2012-03-01' UNION ALL
  SELECT '2012-03-01', '2013-03-01' UNION ALL
  SELECT '2013-03-01', '2014-03-01' UNION ALL
  SELECT '2014-03-01', '2015-03-01' UNION ALL
  SELECT '2015-03-01', '2016-03-01' UNION ALL
  SELECT '2016-03-01', '2017-03-01' UNION ALL
  SELECT '2017-03-01', '2018-03-01' UNION ALL
  SELECT '2018-03-01', '2019-03-01' UNION ALL
  SELECT '2019-03-01', '2020-03-01'
)

第 2 步:编写“三明治”样式连接以查看相对于 start_date 和 end_dates 的检查学生:

joined_data AS (
  SELECT 
    t.student,
    d.start_date,
    d.end_date,
    t.var1
  FROM 
   df t
  JOIN 
    date_ranges d
  ON 
    t.start <= d.end_date AND t.end >= d.start_date
)

但是第 2 步返回空结果

如果步骤 2 有效,我就可以使用一系列 CASE WHEN 语句来执行其余的计数

var1_counts AS (
  SELECT 
    student,
    start_date,
    end_date,
    COUNT(CASE WHEN var1 = 'a' THEN 1 END) AS var1_count
  FROM 
    joined_data
  GROUP BY 
    student, start_date, end_date
)
SELECT 
  student,
  start_date,
  end_date,
  CASE WHEN var1_count > 0 THEN 'Yes' ELSE 'No' END AS at_least_one_var1_a
FROM 
  var1_counts;

有人可以告诉我如何解决这个问题吗?请注意,Netezza 不支持交叉联接、相关查询或递归查询(我通常使用 JOIN 1=1 等技术来处理这些情况)。

谢谢!

  • 注意: 目前我正在研究使用 TO_CHAR()、TO_DATE()、CAST() 和 DATE_PART() 和 DATE_TRUNC() 函数,看看这是否有用?
sql netezza
1个回答
0
投票

除了更改列名称(END 是保留标识符)之外,SQL 在 Netezza 上运行良好。


nzsql <<eof
    create table df (student integer, var1 varchar(10), start_date date, end_date date);
eof

nzload -t df -delim " " <<eof
1 a 2010-01-01 2013-01-01
1 b 2010-01-01 2013-01-01
1 b 2013-05-05 2015-09-09
1 a 2017-10-10 2018-09-01
2 c 2010-01-01 2014-01-01
2 a 2015-01-01 2017-09-01
2 b 2019-01-01 2023-03-05
eof

#########################################################

nzsql <<eof

WITH date_ranges AS (
  SELECT '2010-03-01' AS start_date, '2011-03-01' AS end_date UNION ALL
  SELECT '2011-03-01', '2012-03-01' UNION ALL
  SELECT '2012-03-01', '2013-03-01' UNION ALL
  SELECT '2013-03-01', '2014-03-01' UNION ALL
  SELECT '2014-03-01', '2015-03-01' UNION ALL
  SELECT '2015-03-01', '2016-03-01' UNION ALL
  SELECT '2016-03-01', '2017-03-01' UNION ALL
  SELECT '2017-03-01', '2018-03-01' UNION ALL
  SELECT '2018-03-01', '2019-03-01' UNION ALL
  SELECT '2019-03-01', '2020-03-01'
),

joined_data AS (
  SELECT
    t.student,
    d.start_date,
    d.end_date,
    t.var1
  FROM
   df t
  JOIN
    date_ranges d
  ON
    t.start_DATE <= d.end_date AND t.end_DATE >= d.start_date
),

var1_counts AS (
  SELECT
    student,
    start_date,
    end_date,
    COUNT(CASE WHEN var1 = 'a' THEN 1 END) AS var1_count
  FROM
    joined_data
  GROUP BY
    student, start_date, end_date
)

SELECT
  student,
  start_date,
  end_date,
  CASE WHEN var1_count > 0 THEN 'Yes' ELSE 'No' END AS at_least_one_var1_a
FROM
  var1_counts;

eof

© www.soinside.com 2019 - 2024. All rights reserved.