使用CASE语句比较前一行&当前行的TIMESTAMP数据,并根据差异进行操作。

问题描述 投票:0回答:1

我需要帮助使用Teradata(版本:16.0+)的OLAP函数构建一个查询,并在以下情况下比较&消除Teradata表中的滚动重复。

我在ABC表中有以下9条记录。

          Existing Data(Table - ABC)    


            ACCOUNT_ID  EXT_REF_NO  SERIAL_NUM  RECORD_START_DT RECORD_END_DT
        1   100000000002195 8495752450757852    341FE4E6A1AF    8/13/2019 12:24:42  8/20/2019 23:59:59
        2   100000000002195 8495752450757852    342FE4E6A1AF    8/21/2019 08:49:08  8/25/2019 23:59:59
        3   100000000002195 8495752450757852    343FE4E6A1AF    8/27/2019 02:42:46  8/26/2019 23:59:59
        4   100000000002195 8495752450757852    344FE4E6A1AF    8/28/2019 06:33:50  8/28/2019 23:59:59
        5   100000000002195 8495752450757852    345FE4E6A1AF    8/30/2019 02:35:32  8/31/2019 23:59:59
        6   100000000002195 8495752450757852    346FE4E6A1AF    9/2/2019 00:25:05   9/1/2019 23:59:59
        7   100000000002195 8495752450757852    347FE4E6A1AF    9/3/2019 03:33:28   9/3/2019 23:59:59
        8   100000000002195 8495752450757852    348FE4E6A1AF    9/4/2019 18:35:45   9/8/2019 23:59:59
        9   100000000002195 8495752450757852    349FE4E6A1AF    9/10/2019 11:22:54  3/16/2020 23:59:59

Output      

            ACCOUNT_ID  EXT_REF_NO  SERIAL_NUM  RECORD_START_DT RECORD_END_DT           
        1   100000000002195 8495752450757852    341FE4E6A1AF    8/13/2019 12:24:42  8/26/2019 23:59:59
        2   100000000002195 8495752450757852    342FE4E6A1AF    8/28/2019 06:33:50  8/28/2019 23:59:59
        3   100000000002195 8495752450757852    343FE4E6A1AF    8/30/2019 02:35:32  9/1/2019 23:59:59
        4   100000000002195 8495752450757852    345FE4E6A1AF    9/3/2019 03:33:28   9/8/2019 23:59:59
        5   100000000002195 8495752450757852    346FE4E6A1AF    9/10/2019 11:22:54  3/16/2020 23:59:59
  1. RECORD_END_DT应该总是大于RECORD_START_DT。

  2. 我们只考虑当前行的Record_start_dt = (RECORD_END_DT + 1 day)的前一行记录,如果相差超过1天,则不考虑。

  3. 你可以发现违反了第1点的行号-3 & 6,这其实是数据录入时当天过期记录的bug,你可以虚拟地将RECORD_START_DT分别视为8262019 00:00:00 & 922019 00:00:00的行号-3 & 6进行计算。

  4. ACCOUNT_ID,EXT_REF_NO,SERIAL_NUM这3个都应该被认为是分区的对象。

我尝试了下面的东西。只得到一行输出,最小DEVICE_START_DATE & 最大DEVICE_END_DATE,如下图。

ACCOUNT_ID EXT_REF SERIAL_NUM DEVICE_START_DATE DEVICE_END_DATE 100000000002195 8495752450757852 341FE4E6A1AF 8132017 12:24:42.000000 9162017 23:59:59.000000

 Query: SELECT 
      ACCOUNT_ID,
      EXT_REF, 
      SERIAL_NUM, 
      CASE WHEN (B.DIFF_DAYS <= 1 OR B.DIFF_DAYS IS NULL) THEN
      min(DEVICE_START_DATE) 
      OVER (PARTITION BY ACCOUNT_ID,EXT_REF,SERIAL_NUM order by 
      DEVICE_END_DATE desc)
      WHEN (B.DIFF_DAYS > 1 ) THEN
      min(DEVICE_START_DATE) 
      OVER (PARTITION BY ACCOUNT_ID,EXT_REF,SERIAL_NUM order by 
      DEVICE_END_DATE desc) 
      END AS DEVICE_START_DATE,
      DEVICE_END_DATE
      FROM
      (SELECT A.ACCOUNT_ID,
      A.EXT_REF, 
      A.SERIAL_NUM, 
      A.DEVICE_START_DATE, 
    A.DEVICE_START_DATE_VIRTUAL,
    A.DEVICE_END_DATE, 
    MIN(A.DEVICE_END_DATE)
    OVER ( PARTITION BY A.ACCOUNT_ID,A.EXT_REF,A.SERIAL_NUM ORDER BY A.DEVICE_END_DATE
    ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS DEVICE_END_DATE_PREVIOUS_ROW,
            TRUNC(A.DEVICE_START_DATE_VIRTUAL) - TRUNC(DEVICE_END_DATE_PREVIOUS_ROW) AS DIFF_DAYS
    FROM
    (SELECT 
    ACCOUNT_ID, 
    EXT_REF, 
    SERIAL_NUM, 
    DEVICE_START_DATE, 
    CASE WHEN DEVICE_START_DATE > DEVICE_END_DATE  
    THEN (DEVICE_START_DATE - INTERVAL '1' DAY)   
    ELSE DEVICE_START_DATE END AS DEVICE_START_DATE_VIRTUAL,
    DEVICE_END_DATE
    FROM NDW_XH_TEMP_TABLES.TEST) A) B
    QUALIFY
    ROW_NUMBER() 
    OVER (PARTITION BY ACCOUNT_ID,EXT_REF,SERIAL_NUM order by DEVICE_END_DATE desc) = 1;
sql teradata olap rollup teradata-sql-assistant
1个回答
0
投票

你需要嵌套的OLAP函数,这应该是预期的工作。

SELECT 
   ACCOUNT_ID
  ,EXT_REF_NO
  ,SERIAL_NUM
  ,Coalesce(Lag(next_start)
            Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO
                  ORDER BY next_start NULLS LAST)
           ,min_start) AS RECORD_START_DT

-- If your Teradata version doesn't support LAG/LEAD you must switch to the MAX version
--  ,Coalesce(Max(next_start)
--            Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO
--                  ORDER BY next_start NULLS LAST
--                  ROWS BETWEEN 1 Preceding AND 1 Preceding)
--           ,min_start) AS RECORD_START_DT
  ,RECORD_END_DT  
FROM
 (
   SELECT
      ACCOUNT_ID
     ,EXT_REF_NO
     ,SERIAL_NUM
     ,RECORD_START_DT
     ,RECORD_END_DT

     -- to check for a gap
     ,Lag(fixed_start)
      Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO
            ORDER BY fixed_start DESC) AS next_start
--     ,Max(fixed_start)
--      Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO
--            ORDER BY fixed_start DESC
--            ROWS BETWEEN 1 Preceding AND 1 Preceding) AS next_start

     -- used in the outer COALESCE to get the min start for the 1st group
     ,Min(RECORD_START_DT)
      Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO) AS min_start

     -- gap detection
     ,CASE WHEN Cast(RECORD_END_DT AS DATE) + 1 = Cast(next_start AS DATE) THEN 0 ELSE 1 END AS flag
   FROM
    ( -- fixing the bad data first
      SELECT t.*
        ,CASE WHEN RECORD_START_DT > RECORD_END_DT THEN RECORD_START_DT - INTERVAL '1' DAY ELSE RECORD_START_DT END AS fixed_start
      FROM tab AS t
    ) AS fixed_data
   QUALIFY flag = 1
 ) AS dt

这寻找差距,应用后 旗帜=1 当前行得到了最大的结束日期,前一行得到了匹配的开始日期。外层的Select最后将这个起始日期添加到当前行中。

© www.soinside.com 2019 - 2024. All rights reserved.