汇总不同日期范围的总计

问题描述 投票:0回答:1

我有一张类似的桌子

ymd cus_id 订购 订单值 退款
2023-01-01 12020 3 134 1
2023-06-04 27383 1 80 0
2023-07-13 23823 2 111 2
2023-04-22 12020 7 323 3

但是大很多,记录了每个cus_id及其订单,order_value,每天的退款。

我需要总结这个表,每个 cus_id 以及不同日期范围的订单、订单值、退款总和(完整数据集,过去 4 周、8 周和 12 周)各占一行。最终结果将类似于下面,因此将获得每个 cus_id 的 4 个日期范围。

日期范围 cus_id 总和订单 订单总和值 退款总额
全部 12020 23 1340 9
4周 12020 3 152 1
8周 12020 8 423 2
12周 12020 20 1023 7

表中的 n.b 值是虚构的,因此两个数据集之间可能不匹配

最好的方法是什么?我正在考虑分别计算每个日期范围并添加一个新的

date_range
列,然后添加所有 4 个日期范围的并集,因此最终结果会像这样,但不确定这是否是最有效的方法。

sql amazon-athena
1个回答
0
投票

将在“date_range”CTE(通用表表达式)中创建一个选项,定义每个范围,然后将表连接到该 CTE。

CTE 只是 4 个不同选择语句的并集,为每个日期范围提供一行,并带有开始日期和结束日期。我添加了一个 sort_order 列来帮助对最终结果进行排序。

这是一个工作示例,这与您上面的输出不匹配,因为未提供该数据集。我使用了您提供的示例数据并添加了一些我自己的数据:

with date_range as (
    select 'all' as date_range,
        cast('1900-01-01' as date) as start_date,
        current_date as end_date,
        1 as sort_order
    union
    select '4 weeks' as date_range,
        cast((current_date - interval '28' day) as date) as start_date,
        current_date as end_date,
        2 as sort_order
    union
    select '8 weeks' as date_range,
        cast((current_date - interval '56' day) as date) as start_date,
        current_date as end_date,
        3 as sort_order
    union
    select '12 weeks' as date_range,
        cast((current_date - interval '84' day) as date) as start_date,
        current_date as end_date,
        4 as sort_order
),
sample_data as(
    select *
    From (
            values(cast('2023-01-01' as date), 12020, 3, 134, 1),
                (cast('2023-06-04' as date), 27383, 1, 80, 0),
                (cast('2023-07-13' as date), 23823, 2, 111, 2),
                (cast('2023-04-22' as date), 12020, 7, 323, 3),
                (cast('2023-07-22' as date), 12020, 7, 400, 4),
                (cast('2023-08-20' as date), 27383, 9, 100, 0)
        ) as test_date(ymd, cus_id, "order", order_value, refunds)
)
select date_range,
    cus_id,
    sum("order") as sum_order,
    sum(order_value) as sum_order_value,
    sum(refunds) as sum_refunds,
    sort_order
from date_range dt
    join sample_data td on td.ymd between dt.start_date and dt.end_date --here is where you would add your own table removing the sample_data cte above
group by date_range,
    cus_id,
    sort_order
order by cus_id,
    sort_order

给出结果:

© www.soinside.com 2019 - 2024. All rights reserved.