为什么两个不同的窗口函数给出不同的排序结果?

问题描述 投票:0回答:1

我正在尝试向我的查询添加两个窗口函数。一种是计算每个客户的滚动总数,另一种是简单地添加每个客户的行号。

当滚动总和工作正常时,行号未正确排序。我想添加一个降序行号,这样我就可以将每个客户的所有行号保留为 1,我想在其中找到该客户的总计。

我正在使用 Snowflake SQL。

CREATE TABLE transactions (
    customer_id INTEGER
    ,txn_date DATE
    ,txn_type VARCHAR(10)
    ,txn_amount INTEGER
);

INSERT INTO transactions (customer_id, txn_date, txn_type, txn_amount) 
VALUES
('1', '2020-01-02', 'deposit',  '312'),
('1', '2020-01-02', 'deposit', '312'),
('1', '2020-03-05', 'purchase', '612'), 
('1', '2020-03-05', 'purchase', '612'), 
('1', '2020-03-17', 'deposit', '324'),  
('1', '2020-03-17', 'deposit', '324'),
('1', '2020-03-19', 'purchase', '664'), 
('1', '2020-03-19', 'purchase', '664');

这是我当前使用的查询:

SELECT 
  customer_id
  ,txn_type
  ,txn_date
  ,txn_amount
  ,CASE WHEN txn_type = 'deposit'
        THEN txn_amount
        ELSE txn_amount * -1
   END AS new_amount
  ,SUM(new_amount) OVER (
   PARTITION BY customer_id
   ORDER BY customer_id, txn_date
   ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
  ,ROW_NUMBER() OVER (
   PARTITION BY customer_id
   ORDER BY customer_id, txn_date DESC) AS rn
FROM
  transactions
ORDER BY
  customer_id
  ,txn_date;

它给我带来了这些结果:

客户 ID txn_类型 txn_日期 txn_金额 新金额 跑步总数 rn
1 存款 2020-01-02 312 312 312 7
1 存款 2020-01-02 312 312 624 8
1 购买 2020-03-05 612 -612 12 5
1 购买 2020-03-05 612 -612 -600 6
1 存款 2020-03-17 324 324 -276 3
1 存款 2020-03-17 324 324 48 4
1 购买 2020-03-19 664 -664 -616 1
1 购买 2020-03-19 664 -664 -1280 2

虽然我希望这样:

客户 ID txn_类型 txn_日期 txn_金额 新金额 跑步总数 rn
1 存款 2020-01-02 312 312 312 8
1 存款 2020-01-02 312 312 624 7
1 购买 2020-03-05 612 -612 12 6
1 购买 2020-03-05 612 -612 -600 5
1 存款 2020-03-17 324 324 -276 4
1 存款 2020-03-17 324 324 48 3
1 购买 2020-03-19 664 -664 -616 2
1 购买 2020-03-19 664 -664 -1280 1

那么我在这里做错了什么?

最终,我不仅想按客户 ID 还按月份对行号进行分区,这样我就可以获取每个月的最新记录并查看 running_total 是多少。

sql snowflake-cloud-data-platform window-functions
1个回答
0
投票

因为您有重复的数据,您将得到不一致的结果。下面是在 CTE 中创建伪事务 id 的方法,然后在主查询的第二个窗口函数中引用该伪 id。

with cte as (
 select 
  customer_id
  ,txn_type
  ,txn_date
  ,txn_amount
  ,CASE WHEN txn_type = 'deposit'
        THEN txn_amount
        ELSE txn_amount * -1
   END AS new_amount
  ,ROW_NUMBER() over (
   partition by customer_id 
   order by customer_id, txn_date) as trans_id
  from tmp.data_sci.transactions
)
select 
 customer_id 
 ,txn_type
 ,txn_date
 ,txn_amount
 ,new_amount
 ,SUM(new_amount) OVER (
   PARTITION BY customer_id
   ORDER BY customer_id, txn_date
   ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
 ,ROW_NUMBER() OVER (
   PARTITION BY customer_id
   ORDER BY customer_id, trans_id DESC) AS rn
from cte
order by 
 customer_id, 
 rn desc;

输出:

CUSTOMER_ID TXN_类型 TXN_DATE TXN_AMOUNT NEW_AMOUNT RUNNING_TOTAL RN
1 存款 2020-01-02 312 312 312 8
1 存款 2020-01-02 312 312 624 7
1 购买 2020-03-05 612 -612 12 6
1 购买 2020-03-05 612 -612 -600 5
1 存款 2020-03-17 324 324 -276 4
1 存款 2020-03-17 324 324 48 3
1 购买 2020-03-19 664 -664 -616 2
1 购买 2020-03-19 664 -664 -1280 1

如果这是您想要的路线,那么您可以简单地使用主查询来计算运行总计,因为您不再需要担心重复:

,SUM(new_amount) OVER (
 PARTITION BY customer_id
 ORDER BY customer_id, trans_id) AS running_total
© www.soinside.com 2019 - 2024. All rights reserved.