如何计算 BigQuery SQL 中特定窗口的 SUM?

问题描述 投票:0回答:1

我在 BigQuery 中有一个数据集,其中有一些浮点列,我们称它们为 amount_1、amount_2 等。我还有一个

day
列,它是日期格式的时间戳,每天只有 1 行,这意味着数据已经分组。

对于该月的每一天,我想计算 SUM(amount_1),考虑上个月的最后一天,并回溯 120 天,这意味着,在该天 = '2024-04-05 00 的示例中:00:00 UTC',我想计算日期 '2023-12-03' 和 '2024-03-31' 之间的总和(amount_1),这意味着,对于给定月份内的每一天,总和应该是一样的。

我尝试过这个,但没有成功:

with LastDayPreviousMonth AS (
  SELECT
    day,
    LAST_DAY(DATE_SUB(CAST(timestamp_trunc(day,month) AS DATE), INTERVAL 1 MONTH)) AS last_day_previous_month
  FROM
    main_table
)
select
day
,  SUM(amount_1) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) AS amount_sum
FROM
main_table as M
INNER JOIN LastDayPreviousMonth as L on L.day = M.day

这不起作用,当我尝试调试它时,我要求分钟(天)来知道我正在搜索的窗口,如下所示:

with LastDayPreviousMonth AS (
  SELECT
    day,
    LAST_DAY(DATE_SUB(CAST(timestamp_trunc(day,month) AS DATE), INTERVAL 1 MONTH)) AS last_day_previous_month
  FROM
    main_table
)
select
day
,  SUM(amount_1) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) AS amount_sum
, min(M.day) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) as min_day
FROM
main_table as M
INNER JOIN LastDayPreviousMonth as L on L.day = M.day

我预计 day = '2024-04-05 00:00:00 UTC' 的行的 min_day 值为 '2023-12-03',但它始终是 '2024-01-01 00:00:00四月的每一天都是“UTC”,三月的每一天都是“2023-12-01 00:00:00 UTC”,依此类推,有人能指出我做错了什么吗?

编辑:当我说它“不起作用”时,我的意思是,我正在运行总和的窗口不是我想要的窗口,我在Excel上手动进行了计算以验证这一点,并且我做到了MIN 函数来检查我实际上没有看到正确的窗口

google-bigquery
1个回答
0
投票

我猜,连接扩大了数据集。请使用

group by 1
每天只有一个条目。

您想回到 120 天,因此您使用了 10368000 秒。

with 
main_table as (
  SELECT 1 amount_1, * from unnest(generate_date_array("2023-01-01",current_date()) ) as day,unnest([1,2]) as test
),
LastDayPreviousMonth AS (
  SELECT
    day,
    LAST_DAY(DATE_SUB(CAST(timestamp_trunc(day,month) AS DATE), INTERVAL 1 MONTH)) AS last_day_previous_month
  FROM
    main_table
    group by 1 --- to have for each day only one row
)
select
M.day,
last_day_previous_month
,  SUM(amount_1) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp)) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) AS amount_sum
, min(M.day) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp)) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) as min_day
FROM
main_table as M
INNER JOIN LastDayPreviousMonth as L on L.day = M.day

不需要加入:

with 
main_table as (
  SELECT 1 amount_1, * from unnest(generate_date_array("2023-01-01",current_date()) ) as day,unnest([1,2]) as test
),
LastDayPreviousMonth AS (
  SELECT
    *,#day,
    LAST_DAY(DATE_SUB(CAST(timestamp_trunc(day,month) AS DATE), INTERVAL 1 MONTH)) AS last_day_previous_month
  FROM
    main_table
    #group by 1 --- to have for each day only one row
)
select
M.day,
last_day_previous_month
,  SUM(amount_1) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp)) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) AS amount_sum
, min(M.day) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp)) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) as min_day
FROM
LastDayPreviousMonth as M
© www.soinside.com 2019 - 2024. All rights reserved.