我需要计算A列的累计和,并且需要在达到某个阈值后将其重置。在下面的示例中,我正在计算累计和,并在达到 10 或标签更改后将其重置。
标签 | 价值 | 累计总和 |
---|---|---|
一个 | 1 | 1 |
一个 | 2 | 3 |
一个 | 4 | 7 |
一个 | 6 | 6 |
一个 | 3 | 9 |
两个 | 1 | 1 |
两个 | 2 | 3 |
两个 | 1 | 4 |
我在 bigquery 中尝试了以下代码 SUM(value) OVER (PARTITION BY label ORDER BY dummy_sequence) as cumulative_sum,
但它没有给出预期的结果。
非常感谢任何帮助
我认为 Bigquery MOD 函数可以完成这项工作。
类似的东西:
WITH dataset AS (
SELECT 'One' as Label, 1 as Value, 1 as sequence,
UNION ALL
SELECT 'One' as Label, 2 as Value, 2 as sequence,
UNION ALL
SELECT 'One' as Label, 4 as Value, 3 as sequence,
UNION ALL
SELECT 'One' as Label, 6 as Value, 4 as sequence,
UNION ALL
SELECT 'One' as Label, 3 as Value, 5 as sequence,
UNION ALL
SELECT 'Two' as Label, 1 as Value, 1 as sequence,
UNION ALL
SELECT 'Two' as Label, 2 as Value, 2 as sequence,
UNION ALL
SELECT 'Two' as Label, 1 as Value, 3 as sequence,
)
SELECT
Label,
Value,
MOD(SUM(value) OVER (PARTITION BY label ORDER BY sequence),10) as cumulative_sum,
FROM dataset
给出适当的结果。
RECURSIVE
有条件地累加值。
CREATE TEMP TABLE sample_data AS (
WITH
_sample_data AS (
SELECT 'One' as Label, 1 as Value, 1 as expected_cumulative_sum,
UNION ALL SELECT 'One', 2, 3,
UNION ALL SELECT 'One', 4, 7,
UNION ALL SELECT 'One', 6, 6,
UNION ALL SELECT 'One', 3, 9,
UNION ALL SELECT 'Two', 1, 1,
UNION ALL SELECT 'Two', 2, 3,
UNION ALL SELECT 'Two', 1, 4,
UNION ALL SELECT 'Three', 5, 5,
UNION ALL SELECT 'Three', 4, 9,
UNION ALL SELECT 'Three', 8, 8,
UNION ALL SELECT 'Three', 7, 7,
UNION ALL SELECT 'Three', 5, 5,
UNION ALL SELECT 'Three', 4, 9,
)
SELECT *, ROW_NUMBER() OVER (PARTITION BY Label) as row_num,
FROM _sample_data
);
WITH
RECURSIVE calculate_cumulative_sum AS (
SELECT label, value, row_num, value AS cumulative_sum
FROM sample_data
WHERE row_num = 1
UNION ALL
SELECT
s.label, s.value, s.row_num,
IF(
-- may want to decide between '>' and '>='
s.value + c.cumulative_sum >= 10,
s.value,
s.value + c.cumulative_sum
) AS cumulative_sum,
FROM sample_data AS s
INNER JOIN calculate_cumulative_sum AS c
ON s.label = c.label AND s.row_num = c.row_num + 1
)
SELECT label, row_num, value, cumulative_sum
FROM calculate_cumulative_sum
ORDER BY label, row_num
;