Hive sql 分区

问题描述 投票:0回答:1

我有一列用于 row_number() over (partition...)

row_number() OVER (partition BY customer_id ORDER BY order_date, order_number) AS order_row,

另一列使用 over (partition...) 计算“累积”支出

sum(spend_amount) OVER (partition BY customer_id ORDER BY order_date, order_number) as cumulative_spend,

如何在另一列中标记哪个 order_row 首先达到 100 美元?其中 order_number 123458,order_row = 3,价格为 110.00 美元,这是我想要标记的一个。

例如。

customer_number    order_date  order_number  spend_amount order_row  cumulative_spend
abcdefg            01/01/2023  123456        10.00        1          10.00
abcdefg            01/01/2023  123457        50.00        2          60.00
abcdefg            14/01/2023  123458        50.00        3          110.00
abcdefg            23/01/2023  123459        20.00        4          130.00   
sql hive
1个回答
0
投票

只需从累积和中减去当前行的值即可得到上一行的总和,并检查当前行是否高于 100,而该值是否低于 100。

请注意,如果出现重复,您应该使用

ROWS UNBOUNDED PRECEDING
,而且速度也更快。

SELECT
  *,
  ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date, order_number) AS order_row,
  SUM(spend_amount) OVER (PARTITION BY customer_id
      ORDER BY order_date, order_number ROWS UNBOUNDED PRECEDING) as cumulative_spend,
  CASE WHEN spend_amount >= 100
    AND SUM(spend_amount) OVER (PARTITION BY customer_id
      ORDER BY order_date, order_number ROWS UNBOUNDED PRECEDING) - spend_amound < 100
       THEN CAST(1 AS bit) as cumulative_spend_reached_100
FROM YourTable;
© www.soinside.com 2019 - 2024. All rights reserved.