Databrick SQL - 填补最小和最大日期之间的差距

问题描述 投票:0回答:1

我正在 Azure Databricks Sql 上处理此示例数据,其中日期之间存在间隙。 based_date 列中的数据反映了其存储在系统中的时间。但是,对于预期的输出,我需要日期始终代表该月的第一天,并与相应的机构一起填写任何缺失的日期。我怎样才能实现这个目标?

目前这是我正在处理的代码,但它没有给我预期的递归输出:

CREATE TEMPORARY VIEW xTableA
AS
SELECT 111 AS ACCOUNTID, '2022-02-05' AS BASED_DATE, 'XYZ' AS AGENCY
UNION ALL
SELECT 111, '2022-02-05', 'ABC'
UNION ALL
SELECT 111, '2022-05-25', 'BGG'
UNION ALL
SELECT 111, '2022-07-13', 'DXA'
UNION ALL
SELECT 111, '2023-02-22', 'VGQ'
UNION ALL
SELECT 114, '2022-08-09', 'QYD'
UNION ALL
SELECT 114, '2022-12-26', 'OMG'
UNION ALL
SELECT 114, '2023-03-12', 'TNK';

WITH xBased_Date AS (
SELECT 
    ACCOUNTID,
    AGENCY,
    CAST(date_trunc('MONTH', BASED_DATE) AS DATE) AS StartDate,
    LEAD(CAST(date_trunc('MONTH', BASED_DATE) AS DATE),1) OVER (ORDER BY CAST(date_trunc('MONTH', BASED_DATE) AS DATE)) AS EndDate
FROM xTableA
),
xRecursive AS (
    SELECT
    ACCOUNTID,
    add_months(StartDate, 1) AS xDate
    FROM xBased_Date
    WHERE add_months(StartDate, 1) <= EndDate
)

SELECT
a.ACCOUNTID,
ADD_MONTHS(a.StartDate, 1) AS BASED_DATE,
a.AGENCY
FROM xBased_Date a
CROSS JOIN xRecursive b
WHERE ADD_MONTHS(a.StartDate, 1) <= a.EndDate
ORDER BY 1, 2

样本数据:

| ACCOUNTID | BASED_DATE | AGENCY |
| --------- | ---------- | -------|
| 111       | 2022-02-05 | XYZ    |
| 111       | 2022-02-05 | ABC    |
| 111       | 2022-05-25 | BGG    |
| 111       | 2022-07-13 | DXA    |
| 111       | 2023-02-22 | VGQ    |
| 114       | 2022-08-09 | QYD    |
| 114       | 2022-12-26 | OMG    |
| 114       | 2023-03-12 | TNK    |

预期输出:我需要将日期始终转换为每月的第一天,并填写帐户上最后一个机构的值

| ACCOUNTID | BASED_DATE | AGENCY |
| --------- | ---------- | -------|
| 111       | 2022-02-01 | AA     |
| 111       | 2022-02-01 | AZ     |
| 111       | 2022-03-01 | AZ     |
| 111       | 2022-04-01 | AZ     |
| 111       | 2022-05-01 | AB     |
| 111       | 2022-06-01 | AB     |
| 111       | 2022-07-01 | AA     |
| 111       | 2022-08-01 | AA     |
| 111       | 2022-09-01 | AA     |
| 111       | 2022-10-01 | AA     |
| 111       | 2022-11-01 | AA     |
| 111       | 2022-12-01 | AA     |
| 111       | 2023-01-01 | AA     |
| 111       | 2023-02-01 | AF     |
| 114       | 2022-08-01 | AY     |
| 114       | 2022-09-01 | AY     |
| 114       | 2022-10-01 | AY     |
| 114       | 2022-11-01 | AY     |
| 114       | 2022-12-01 | AX     |
| 114       | 2023-01-01 | AX     |
| 114       | 2022-02-01 | AX     |
| 114       | 2023-03-01 | AG     |

如果您能给我任何其他 sql 语法的想法或类似代码,我将不胜感激,我将自己将其翻译为 databricks

sql sql-server databricks databricks-sql
1个回答
0
投票

我通过在社区上搜索和收集几乎相同场景的答案来解决我的问题。获取逻辑并应用一些修改来解决我的问题并获得我想要的预期输出:

CREATE TEMPORARY VIEW xTableA
AS
SELECT 111 AS ACCOUNTID, '2022-02-05' AS BASED_DATE, 'XYZ' AS AGENCY
UNION ALL
SELECT 111, '2022-02-05', 'ABC'
UNION ALL
SELECT 111, '2022-05-25', 'BGG'
UNION ALL
SELECT 111, '2022-07-13', 'DXA'
UNION ALL
SELECT 111, '2023-02-22', 'VGQ'
UNION ALL
SELECT 114, '2022-08-09', 'QYD'
UNION ALL
SELECT 114, '2022-12-26', 'OMG'
UNION ALL
SELECT 114, '2023-03-12', 'TNK';

WITH xBased_Date AS (
SELECT 
    ACCOUNTID,
    AGENCY,
    CAST(date_trunc('MONTH', BASED_DATE) AS DATE) AS StartDate,
    LEAD(CAST(date_trunc('MONTH', BASED_DATE) AS DATE),1) OVER (ORDER BY CAST(date_trunc('MONTH', BASED_DATE) AS DATE)) AS EndDate
FROM xTableA

)
,xRecursive AS (
    SELECT 
        ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1 AS num
    FROM 
        (SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) AS t1
    CROSS JOIN 
        (SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) AS t2
)

SELECT 
    m.ACCOUNTID,
    ADD_MONTHS(m.StartDate, n.num) AS BASED_DATE,
    m.AGENCY
FROM xBased_Date m
CROSS JOIN xRecursive n
WHERE ADD_MONTHS(m.StartDate, n.num) <= m.EndDate
ORDER BY m.ACCOUNTID, BASED_DATE;

如果您对我的 xRecursive CTE 有更好的解决方案并使其更加动态,这将有助于我使用。

© www.soinside.com 2019 - 2024. All rights reserved.