我正在尝试计算每个群组内的运行总和(收入)。 我正在使用以下查询来实现此目的:
round(sum(SUM(i.subtotal)) OVER (PARTITION BY cft.cohort_start order by invoice_date), 2) AS accrual_cum
它并不能帮助我实现我想要的目标,因为它只是返回每个队列组的总和,而我想要的是向我显示每个队列组内的运行总和,因此对于第一个队列中的第一笔交易收入 = A ,第二笔交易变成 A + B,第三笔交易变成 A + B + C 等等。
我在这里做错了什么?
这是整个查询,以防万一:
WITH
invoices AS (
SELECT * FROM {{ source ('bq_raw_data', 'invoices') }}
),
customer_first_transaction AS (
SELECT
customer_id,
MIN(invoice_date) AS cohort_start
FROM invoices
GROUP BY customer_id
)
SELECT
cft.cohort_start AS customer_cohort,
invoice_date,
EXTRACT(DAY FROM (current_date-cft.cohort_start)) AS days_since_cohort_start,
i.product_id AS product,
coalesce(companies.region, '0') as region,
round(SUM(i.subtotal), 2) as accrual_revenue,
round(sum(SUM(i.subtotal)) OVER (PARTITION BY cft.cohort_start order by invoice_date), 2) AS accrual_cum,
case when i.product_id = 'stp_9' then ROUND(SUM(i.subtotal) / 12, 2)
else round(SUM(i.subtotal), 2)
end AS p_and_l_revenue,
ROUND(SUM(CASE WHEN i.product_id = 'stp_9' THEN sum(i.subtotal) / 12 ELSE sum(i.subtotal) END) OVER
(PARTITION BY cft.cohort_start ORDER BY invoice_date), 2)
AS p_and_l_cum
FROM invoices i
left JOIN customer_first_transaction cft ON i.customer_id = cft.customer_id
LEFT JOIN {{ source('bq_raw_data', 'customers') }} AS customers ON customers.id = i.customer_id
LEFT JOIN {{ source('bq_raw_data', 'companies') }} AS companies ON companies.id = customers.seedlegals_company_id
group by cft.cohort_start, invoice_date, i.product_id, companies.region
order by customer_cohort, invoice_date
您编写的 SQL 查询看起来几乎是正确的,但是您如何使用
SUM()
函数有一个小问题。 OVER
子句中的查询不应包含嵌套的 SUM
。窗口函数内部的SUM
已经计算出累计总数;周围不应有额外的 SUM
。
计算运行总计或累积总和的窗口函数的正确用法如下所示:
SUM(i.subtotal) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS accrual_cum
对于
p_and_l_cum
,应该是:
SUM(CASE WHEN i.product_id = 'stp_9' THEN i.subtotal / 12 ELSE i.subtotal END) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS p_and_l_cum
您应该如何修改查询:
WITH
invoices AS (
SELECT * FROM {{ source ('bq_raw_data', 'invoices') }}
),
customer_first_transaction AS (
SELECT
customer_id,
MIN(invoice_date) AS cohort_start
FROM invoices
GROUP BY customer_id
)
SELECT
cft.cohort_start AS customer_cohort,
invoice_date,
EXTRACT(DAY FROM (current_date - cft.cohort_start)) AS days_since_cohort_start,
i.product_id AS product,
coalesce(companies.region, '0') as region,
ROUND(SUM(i.subtotal), 2) as accrual_revenue,
ROUND(SUM(i.subtotal) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 2) AS accrual_cum,
case when i.product_id = 'stp_9' then ROUND(i.subtotal / 12, 2)
else ROUND(i.subtotal, 2)
end AS p_and_l_revenue,
ROUND(SUM(CASE WHEN i.product_id = 'stp_9' THEN i.subtotal / 12 ELSE i.subtotal END) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 2) AS p_and_l_cum
FROM invoices i
LEFT JOIN customer_first_transaction cft ON i.customer_id = cft.customer_id
LEFT JOIN {{ source('bq_raw_data', 'customers') }} AS customers ON customers.id = i.customer_id
LEFT JOIN {{ source('bq_raw_data', 'companies') }} AS companies ON companies.id = customers.seedlegals_company_id
GROUP BY cft.cohort_start, invoice_date, i.product_id, companies.region
ORDER BY customer_cohort, invoice_date;