查找学生在给定年份范围内是否满足条件

问题描述 投票:0回答:3

我正在使用 Netezza SQL。

我有一个名为

df
的表格,所有日期都是 DATE 格式,看起来像
2010-01-01 00:00:00
:

学生 var1 开始 结束
1 a 2010-01-01 2013-01-01
1 b 2010-01-01 2013-01-01
1 b 2013-05-05 2015-09-09
1 a 2017-10-10 2018-09-01
2 c 2010-01-01 2014-01-01
2 a 2015-01-01 2017-09-01
2 b 2019-01-01 2023-03-05

我的问题:

  • 考虑从 2010 年到 2020 年
  • 对于每个学生,在每年的3月1日到3月1日之间,我想找出该学生是否至少有一个var1 = a的值
  • 如果是,则为 TRUE,否则为 FALSE
    注意:当学生在给定年份中缺少信息时,仍然是 FALSE

最终结果如下所示:

SN 学生 开始时间 结束时间 at_least_one_var1_a
1 1 2010-03-01 2011-03-01 正确
2 1 2011-03-01 2012-03-01 正确
3 1 2012-03-01 2013-03-01 正确
4 1 2013-03-01 2014-03-01 错误
5 1 2014-03-01 2015-03-01 错误
6 1 2015-03-01 2016-03-01 错误
7 1 2016-03-01 2017-03-01 错误
8 1 2017-03-01 2018-03-01 正确
9 1 2018-03-01 2019-03-01 正确
10 1 2019-03-01 2020-03-01 错误
11 2 2010-03-01 2011-03-01 错误
12 2 2011-03-01 2012-03-01 错误
13 2 2012-03-01 2013-03-01 错误
14 2 2013-03-01 2014-03-01 错误
15 2 2014-03-01 2015-03-01 正确
16 2 2015-03-01 2016-03-01 正确
17 2 2016-03-01 2017-03-01 正确
18 2 2017-03-01 2018-03-01 正确
19 2 2018-03-01 2019-03-01 错误
20 2 2019-03-01 2020-03-01 错误

我的问题:我不知道如何在 SQL 中执行此操作

我尝试使用以下逻辑:

第 1 步:创建包含所有日期范围的 CTE

WITH step1 AS (
  SELECT '2010-03-01' AS start_date, '2011-03-01' AS end_date UNION ALL
  SELECT '2011-03-01', '2012-03-01' UNION ALL
  SELECT '2012-03-01', '2013-03-01' UNION ALL
  SELECT '2013-03-01', '2014-03-01' UNION ALL
  SELECT '2014-03-01', '2015-03-01' UNION ALL
  SELECT '2015-03-01', '2016-03-01' UNION ALL
  SELECT '2016-03-01', '2017-03-01' UNION ALL
  SELECT '2017-03-01', '2018-03-01' UNION ALL
  SELECT '2018-03-01', '2019-03-01' UNION ALL
  SELECT '2019-03-01', '2020-03-01'
)

第 2 步:编写“三明治”样式连接以查看相对于 start_date 和 end_dates 的检查学生:

joined_data AS (
  SELECT 
    t.student,
    d.start_date,
    d.end_date,
    t.var1
  FROM 
    df t
  JOIN 
    date_ranges d
  ON 
    t.start <= d.end_date AND t.end >= d.start_date
)

但是第 2 步返回空结果

如果步骤 2 有效,我就可以使用一系列 CASE WHEN 语句来执行其余的计数

var1_counts AS (
  SELECT 
    student,
    start_date,
    end_date,
    COUNT(CASE WHEN var1 = 'a' THEN 1 END) AS var1_count
  FROM 
    joined_data
  GROUP BY 
    student, start_date, end_date
)
SELECT 
  student,
  start_date,
  end_date,
  CASE WHEN var1_count > 0 THEN 'Yes' ELSE 'No' END AS at_least_one_var1_a
FROM 
  var1_counts;

有人可以告诉我如何解决这个问题吗?请注意,Netezza 不支持交叉联接、相关查询或递归查询(我通常使用 JOIN 1=1 等技术来处理这些情况)。

注意: 目前我正在研究使用 TO_CHAR()、TO_DATE()、CAST() 和 DATE_PART() 和 DATE_TRUNC() 函数,看看这是否有用?

sql netezza
3个回答
1
投票

除了更改列名称(END 是保留标识符)之外,SQL 在 Netezza 上运行良好。


nzsql <<eof
    create table df (student integer, var1 varchar(10), start_date date, end_date date);
eof

nzload -t df -delim " " <<eof
1 a 2010-01-01 2013-01-01
1 b 2010-01-01 2013-01-01
1 b 2013-05-05 2015-09-09
1 a 2017-10-10 2018-09-01
2 c 2010-01-01 2014-01-01
2 a 2015-01-01 2017-09-01
2 b 2019-01-01 2023-03-05
eof

#########################################################

nzsql <<eof

WITH date_ranges AS (
  SELECT '2010-03-01' AS start_date, '2011-03-01' AS end_date UNION ALL
  SELECT '2011-03-01', '2012-03-01' UNION ALL
  SELECT '2012-03-01', '2013-03-01' UNION ALL
  SELECT '2013-03-01', '2014-03-01' UNION ALL
  SELECT '2014-03-01', '2015-03-01' UNION ALL
  SELECT '2015-03-01', '2016-03-01' UNION ALL
  SELECT '2016-03-01', '2017-03-01' UNION ALL
  SELECT '2017-03-01', '2018-03-01' UNION ALL
  SELECT '2018-03-01', '2019-03-01' UNION ALL
  SELECT '2019-03-01', '2020-03-01'
),

joined_data AS (
  SELECT
    t.student,
    d.start_date,
    d.end_date,
    t.var1
  FROM
   df t
  JOIN
    date_ranges d
  ON
    t.start_DATE <= d.end_date AND t.end_DATE >= d.start_date
),

var1_counts AS (
  SELECT
    student,
    start_date,
    end_date,
    COUNT(CASE WHEN var1 = 'a' THEN 1 END) AS var1_count
  FROM
    joined_data
  GROUP BY
    student, start_date, end_date
)

SELECT
  student,
  start_date,
  end_date,
  CASE WHEN var1_count > 0 THEN 'Yes' ELSE 'No' END AS at_least_one_var1_a
FROM
  var1_counts;

eof


1
投票

先解释一下:

  • 结果由 m 年 x n 不同的学生组成,因此请使用
    cross join
    distinct
  • 使用日期范围重叠查询查找重叠行
  • 使用
    exists
    而不是左连接,因为...
    使用左连接,一个三月到三月的范围可能会与实际表中的两行重叠,因此您可以获得比预期更多的行,存在不会重复

查询:

with yearlist as (
    select '2010-03-01' as start_date, '2011-03-01' as end_date union all
    select '2011-03-01', '2012-03-01' union all
    select '2012-03-01', '2013-03-01' union all
    select '2013-03-01', '2014-03-01' union all
    select '2014-03-01', '2015-03-01' union all
    select '2015-03-01', '2016-03-01' union all
    select '2016-03-01', '2017-03-01' union all
    select '2017-03-01', '2018-03-01' union all
    select '2018-03-01', '2019-03-01' union all
    select '2019-03-01', '2020-03-01'
), studentlist as (
    select distinct student
    from df
)
select
    studentlist.student,
    yearlist.start_date,
    yearlist.end_date,
    case when exists (
        select *
        from df
        where df.var1 = 'a'
        and df.student = studentlist.student
        and df.end > yearlist.start_date
        and df.start < yearlist.end_date
    ) then 'true' else 'false' end as has_a
from yearlist
cross join studentlist
order by studentlist.student, yearlist.start_date

DB<>Fiddle 上的演示


0
投票

使用您的原始代码,我们不会得到您的“预期结果”(只有 18 行而不是 20 行......几年都不会出现)。

这是重写的

  • 只有一个 CTE(而不是许多/嵌套 CTE)
  • CTE 用于创建日期范围(如果需要,可以更轻松地从 10 扩展到年)
  • 它确实使用了 CROSS JOIN(受支持)
  • 结果集包含全部20行
WITH date_ranges AS (
  select ('2010-03-01'::date + interval '1 year' * idx)::date as start_date,
         (start_date + interval '1 year')::date               as end_date
  from _v_vector_idx where start_date between '2010-03-01' and '2019-03-01'
)

SELECT
    t.student,
    d.start_date,
    d.end_date,
    NVL2(max(case when  t.start_DATE <= d.end_date AND t.end_DATE >= d.start_date and t.var1 = 'a' then 1 else null end),'TRUE','FALSE')
FROM
    df t
    CROSS JOIN date_ranges d

group by 1,2,3
order by 1,2,3;



© www.soinside.com 2019 - 2024. All rights reserved.