使用 Teradata 根据日期将记录分为不同的组

问题描述 投票:0回答:1

我有以下数据以及他们的

MIN
MAX
CLOSE
日期。

所有我都试图根据这些日期将这些帐户分为三组,并显示他们活跃的每个月的数量

  custID         MINDate     MAXDATE         CLOSEDATE
  10001          1/1/2022   12/31/2022
  20001          7/6/2022   12/31/2022
  30001          4/5/20022  6/10/2022         6/10/2022
  40001          1/1/2022   12/31/2022 

Grp1 - 如果 `custID 在整个 2022 年都可用,那么它将显示在所有 12 个月下

Grp2 - 如果 'CustID 仅在几天内可用,那么它将仅在该特定月份位于 'Grp2 中,而对于剩余的月份,它将位于 'Grp1

Grp3 - 如果“CustId”填充为“CLOSEDATE”,那么对于那个关闭的月份来说它将是“Grp3”

想要

           jan22 Feb22 Mar22 Apr22 May22 Jun22 Jul22 Aug22 Sep22 Oct22 Nov22 Dec22       
Grp1         2    2     2      2     3      2    2      3    3     3    3     3   
Grp2                           1                 1    
Grp3                                        1

10001 - 所有月份的 Y Grp1

20001 - Y 在 7 月 Grp2,Y 从 8 月到 12 月

30001 - 4 月 Grp2 中的 Y,5 月 Grp1 中的 Y 和 6 月 Grp3 中的 Y

40001 - 所有月份的 Y Grp1

我试图构建以下逻辑,但在中间迷路了

    WITH cte1 AS (
      SELECT custID,
        MINDate,
        MAXDATE,
        ADD_MONTHS(TRUNC(MINDate, 'MM'), ROW_NUMBER() OVER (PARTITION BY custID ORDER BY MINDate) - 1) AS Month
      FROM Have
      QUALIFY EXTRACT(MONTH FROM MINDate) = EXTRACT(MONTH FROM ADD_MONTHS(TRUNC(MINDate, 'MM'), ROW_NUMBER() OVER (PARTITION BY custID ORDER BY MINDate) - 1))
    ), cte2 AS (
      SELECT custID,
        Month,
        CASE WHEN Month = ADD_MONTHS(TRUNC(MAXDATE, 'MM'), 1) THEN LAST_DAY(MAXDATE) - MAXDATE + 1 ELSE 1 END AS Days
      FROM (
        SELECT custID,
          MIN(Month) AS Month,
          MAX(MAXDATE) AS MAXDATE
        FROM cte1
        GROUP BY custID
      ) t1
      INNER JOIN cte1 t2 ON t1.custID = t2.custID AND t1.Month <= t2.Month AND t2.Month < ADD_MONTHS(TRUNC(MAXDATE, 'MM'), 1)
    ), 
      SELECT Month, SUM(Days) AS Days
      FROM cte2
      GROUP BY Month
sql grouping teradata row-number
1个回答
0
投票

以下是我将如何解决这个问题:

  1. 加入

    sys_calendar.calendar
    以获取
    MINDATE
    MAXDATE
    之间的所有日期。将每个
    sys_calendar.calendar.calendar_date
    值设置为他们的月份的第一天,然后获取 DISTINCT。这样我们就有了每一个
    custid
    /
    month_start
    组合的记录。

    (您也可以使用 Teradata 的

    EXPAND ON
    子句,这样您就不必去
    sys_calendar
    表并做
    DISTINCT
    废话,但是
    EXPAND ON
    PIVOT
    在同一个 sql 提交中不兼容)

  2. 使用 CASE 表达式,对于这些记录中的每一个,使用您概述的逻辑确定它们属于哪个组。

  3. Pivot.

这看起来像:

WITH expanded_months AS 
(
    SELECT DISTINCT Have.*, calendar_date - EXTRACT(DAY FROM calendar_date) + 1 as month_start
    FROM Have 
        INNER JOIN sys_calendar.calendar cal 
            ON PERIOD(MINDATE, NEXT(MAXDATE)) CONTAINS cal.calendar_date
)
, month_groups AS 
(
SELECT em.*, COUNT(month_start) OVER (PARTITION BY custID) as month_count,
    CASE 
         WHEN month_count < 12 AND month_start = CLOSEDATE - EXTRACT(DAY FROM CLOSEDATE) + 1 THEN 'Grp3'
         WHEN month_count < 12 AND LAG(month_start) OVER (PARTITION BY custid ORDER BY month_start) IS NULL THEN 'Grp2'
         ELSE 'Grp1'
         END as Grp
FROM expanded_months em
)
SELECT pvt.*
FROM (SELECT Grp, month_start FROM month_groups) mg
PIVOT (
      count(*) as val
      FOR month_start 
      IN (
            DATE '2022-01-01' AS jan22,
            DATE '2022-02-01' AS feb22,
            DATE '2022-03-01' AS mar22,
            DATE '2022-04-01' AS apr22,
            DATE '2022-05-01' AS may22,
            DATE '2022-06-01' AS jun22,
            DATE '2022-07-01' AS jul22,
            DATE '2022-08-01' AS aug22,
            DATE '2022-09-01' AS sep22,
            DATE '2022-10-01' AS oct22,
            DATE '2022-11-01' AS nov22,
            DATE '2022-12-01' AS dec22
        )     
      )pvt
ORDER BY Grp

Grp jan22_val feb22_val mar22_val apr22_val may22_val jun22_val jul22_val aug22_val sep22_val oct22_val nov22_val dec22_val
Grp1 2 2 2 2 3 2 2 3 3 3 3 3
Grp2 0 0 0 1 0 0 1 0 0 0 0 0
Grp3 0 0 0 0 0 1 0 0 0 0 0 0
© www.soinside.com 2019 - 2024. All rights reserved.