使用窗口函数 sum over () 计算队列 SQL 的运行收入

问题描述 投票:0回答:1

我正在尝试计算每个群组内的运行总和(收入)。 我正在使用以下查询来实现此目的:

round(sum(SUM(i.subtotal)) OVER (PARTITION BY cft.cohort_start order by invoice_date), 2) AS accrual_cum
  • cohort_start 是队列组的名称
  • 小计为收入金额

它并不能帮助我实现我想要的目标,因为它只是返回每个队列组的总和,而我想要的是向我显示每个队列组内的运行总和,因此对于第一个队列中的第一笔交易收入 = A ,第二笔交易变成 A + B,第三笔交易变成 A + B + C 等等。

我在这里做错了什么?

这是整个查询,以防万一:

WITH 
invoices AS (
    SELECT * FROM {{ source ('bq_raw_data', 'invoices') }}
), 

customer_first_transaction AS (
    SELECT
        customer_id,
        MIN(invoice_date) AS cohort_start
    FROM invoices
    GROUP BY customer_id
)

SELECT
    cft.cohort_start AS customer_cohort,
    invoice_date,
    EXTRACT(DAY FROM (current_date-cft.cohort_start)) AS days_since_cohort_start,
    i.product_id AS product,
    coalesce(companies.region, '0') as region,
    round(SUM(i.subtotal), 2) as accrual_revenue,
    round(sum(SUM(i.subtotal)) OVER (PARTITION BY cft.cohort_start order by invoice_date), 2) AS accrual_cum, 
    case when i.product_id = 'stp_9' then ROUND(SUM(i.subtotal) / 12, 2) 
    else round(SUM(i.subtotal), 2) 
    end AS p_and_l_revenue, 
    ROUND(SUM(CASE WHEN i.product_id = 'stp_9' THEN sum(i.subtotal) / 12 ELSE sum(i.subtotal) END) OVER 
    (PARTITION BY cft.cohort_start ORDER BY invoice_date), 2) 
    AS p_and_l_cum

FROM invoices i
left JOIN customer_first_transaction cft ON i.customer_id = cft.customer_id
LEFT JOIN {{ source('bq_raw_data', 'customers') }} AS customers ON customers.id = i.customer_id
LEFT JOIN {{ source('bq_raw_data', 'companies') }} AS companies ON companies.id = customers.seedlegals_company_id
group by cft.cohort_start, invoice_date, i.product_id, companies.region
order by customer_cohort, invoice_date
sql google-bigquery window-functions cumsum
1个回答
0
投票

您编写的 SQL 查询看起来几乎是正确的,但是您如何使用

SUM()
函数有一个小问题。
OVER
子句中的查询不应包含嵌套的
SUM
。窗口函数内部的
SUM
已经计算出累计总数;周围不应有额外的
SUM

计算运行总计或累积总和的窗口函数的正确用法如下所示:

SUM(i.subtotal) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS accrual_cum

对于

p_and_l_cum
,应该是:

SUM(CASE WHEN i.product_id = 'stp_9' THEN i.subtotal / 12 ELSE i.subtotal END) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS p_and_l_cum

您应该如何修改查询:

WITH 
invoices AS (
    SELECT * FROM {{ source ('bq_raw_data', 'invoices') }}
), 
customer_first_transaction AS (
    SELECT
        customer_id,
        MIN(invoice_date) AS cohort_start
    FROM invoices
    GROUP BY customer_id
)
SELECT
    cft.cohort_start AS customer_cohort,
    invoice_date,
    EXTRACT(DAY FROM (current_date - cft.cohort_start)) AS days_since_cohort_start,
    i.product_id AS product,
    coalesce(companies.region, '0') as region,
    ROUND(SUM(i.subtotal), 2) as accrual_revenue,
    ROUND(SUM(i.subtotal) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 2) AS accrual_cum,
    case when i.product_id = 'stp_9' then ROUND(i.subtotal / 12, 2) 
    else ROUND(i.subtotal, 2) 
    end AS p_and_l_revenue,
    ROUND(SUM(CASE WHEN i.product_id = 'stp_9' THEN i.subtotal / 12 ELSE i.subtotal END) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 2) AS p_and_l_cum
FROM invoices i
LEFT JOIN customer_first_transaction cft ON i.customer_id = cft.customer_id
LEFT JOIN {{ source('bq_raw_data', 'customers') }} AS customers ON customers.id = i.customer_id
LEFT JOIN {{ source('bq_raw_data', 'companies') }} AS companies ON companies.id = customers.seedlegals_company_id
GROUP BY cft.cohort_start, invoice_date, i.product_id, companies.region
ORDER BY customer_cohort, invoice_date;
© www.soinside.com 2019 - 2024. All rights reserved.