我正在尝试向我的查询添加两个窗口函数。一种是计算每个客户的滚动总数,另一种是简单地添加每个客户的行号。
当滚动总和工作正常时,行号未正确排序。我想添加一个降序行号,这样我就可以将每个客户的所有行号保留为 1,我想在其中找到该客户的总计。
我正在使用 Snowflake SQL。
CREATE TABLE transactions (
customer_id INTEGER
,txn_date DATE
,txn_type VARCHAR(10)
,txn_amount INTEGER
);
INSERT INTO transactions (customer_id, txn_date, txn_type, txn_amount)
VALUES
('1', '2020-01-02', 'deposit', '312'),
('1', '2020-01-02', 'deposit', '312'),
('1', '2020-03-05', 'purchase', '612'),
('1', '2020-03-05', 'purchase', '612'),
('1', '2020-03-17', 'deposit', '324'),
('1', '2020-03-17', 'deposit', '324'),
('1', '2020-03-19', 'purchase', '664'),
('1', '2020-03-19', 'purchase', '664');
这是我当前使用的查询:
SELECT
customer_id
,txn_type
,txn_date
,txn_amount
,CASE WHEN txn_type = 'deposit'
THEN txn_amount
ELSE txn_amount * -1
END AS new_amount
,SUM(new_amount) OVER (
PARTITION BY customer_id
ORDER BY customer_id, txn_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
,ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY customer_id, txn_date DESC) AS rn
FROM
transactions
ORDER BY
customer_id
,txn_date;
它给我带来了这些结果:
客户 ID | txn_类型 | txn_日期 | txn_金额 | 新金额 | 跑步总数 | rn |
---|---|---|---|---|---|---|
1 | 存款 | 2020-01-02 | 312 | 312 | 312 | 7 |
1 | 存款 | 2020-01-02 | 312 | 312 | 624 | 8 |
1 | 购买 | 2020-03-05 | 612 | -612 | 12 | 5 |
1 | 购买 | 2020-03-05 | 612 | -612 | -600 | 6 |
1 | 存款 | 2020-03-17 | 324 | 324 | -276 | 3 |
1 | 存款 | 2020-03-17 | 324 | 324 | 48 | 4 |
1 | 购买 | 2020-03-19 | 664 | -664 | -616 | 1 |
1 | 购买 | 2020-03-19 | 664 | -664 | -1280 | 2 |
虽然我希望这样:
客户 ID | txn_类型 | txn_日期 | txn_金额 | 新金额 | 跑步总数 | rn |
---|---|---|---|---|---|---|
1 | 存款 | 2020-01-02 | 312 | 312 | 312 | 8 |
1 | 存款 | 2020-01-02 | 312 | 312 | 624 | 7 |
1 | 购买 | 2020-03-05 | 612 | -612 | 12 | 6 |
1 | 购买 | 2020-03-05 | 612 | -612 | -600 | 5 |
1 | 存款 | 2020-03-17 | 324 | 324 | -276 | 4 |
1 | 存款 | 2020-03-17 | 324 | 324 | 48 | 3 |
1 | 购买 | 2020-03-19 | 664 | -664 | -616 | 2 |
1 | 购买 | 2020-03-19 | 664 | -664 | -1280 | 1 |
那么我在这里做错了什么?
最终,我不仅想按客户 ID 还按月份对行号进行分区,这样我就可以获取每个月的最新记录并查看 running_total 是多少。
因为您有重复的数据,您将得到不一致的结果。下面是在 CTE 中创建伪事务 id 的方法,然后在主查询的第二个窗口函数中引用该伪 id。
with cte as (
select
customer_id
,txn_type
,txn_date
,txn_amount
,CASE WHEN txn_type = 'deposit'
THEN txn_amount
ELSE txn_amount * -1
END AS new_amount
,ROW_NUMBER() over (
partition by customer_id
order by customer_id, txn_date) as trans_id
from tmp.data_sci.transactions
)
select
customer_id
,txn_type
,txn_date
,txn_amount
,new_amount
,SUM(new_amount) OVER (
PARTITION BY customer_id
ORDER BY customer_id, txn_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
,ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY customer_id, trans_id DESC) AS rn
from cte
order by
customer_id,
rn desc;
输出:
CUSTOMER_ID | TXN_类型 | TXN_DATE | TXN_AMOUNT | NEW_AMOUNT | RUNNING_TOTAL | RN |
---|---|---|---|---|---|---|
1 | 存款 | 2020-01-02 | 312 | 312 | 312 | 8 |
1 | 存款 | 2020-01-02 | 312 | 312 | 624 | 7 |
1 | 购买 | 2020-03-05 | 612 | -612 | 12 | 6 |
1 | 购买 | 2020-03-05 | 612 | -612 | -600 | 5 |
1 | 存款 | 2020-03-17 | 324 | 324 | -276 | 4 |
1 | 存款 | 2020-03-17 | 324 | 324 | 48 | 3 |
1 | 购买 | 2020-03-19 | 664 | -664 | -616 | 2 |
1 | 购买 | 2020-03-19 | 664 | -664 | -1280 | 1 |
如果这是您想要的路线,那么您可以简单地使用主查询来计算运行总计,因为您不再需要担心重复:
,SUM(new_amount) OVER (
PARTITION BY customer_id
ORDER BY customer_id, trans_id) AS running_total