Snowflake - 需要显示每个租户过去 7 天的数据

问题描述 投票:0回答:1

我有一个(可能很愚蠢的)问题。我必须在 Snowflake 中创建一个视图,其中有日期列、租户、计数(number_of_requests)和计数(number_f_collections)。这部分我已经完成并且很满意。

基本代码如下:

select date_trunc(day, CREATED)::date as CREATED_DATE,
    TENANT,
    count(distinct case when comment = 'Non personalised order created' then SIMNO end) as NUMBER_OF_REQUESTS,
    count(distinct case when comment = 'Non personalised order collected by customer' then SIMNO end) as NUMBER_OF_COLLECTIONS
from <DATABASE>.<SCHEMA>.<TABLE>;

第二部分是我正在努力解决的问题。在同一视图中,他们需要一个列,其中包含过去 1-7 天内每个租户的请求数、1-12 天以及 1-25 天发送的请求数。对于收藏来说也是如此。我什至不知道如何做到这一点。请查看所需的列名称。

  • 创建_日期
  • 租户
  • NUMBER_OF_REQUESTS
  • NUMBER_OF_COLLECTIONS
  • NUMBER_OF_REQUESTS_1_7_DAYS_AGO
  • NUMBER_OF_REQUESTS_1_12_DAYS_AGO
  • NUMBER_OF_REQUESTS_1_25_DAYS_AGO
  • NUMBER_OF_COLLECTIONS_1_7_DAYS_AGO
  • NUMBER_OF_COLLECTIONS_1_12_DAYS_AGO
  • NUMBER_OF_COLLECTIONS_1_25_DAYS_AGO

任何帮助将非常感激。

我基本上只设法创建了基本代码。我尝试在日期、间隔 - 7 等之间使用。但都不起作用。

结果将类似于:

日期 租户 无_请求 无_收藏 请求_1_7天 请求_1_12天 请求_1_25天 收藏_1_7天 收藏_1_7天 收藏_1_7天
2023 年 12 月 1 日 租户1 65 46 455 780 1625 322 552 1150
2023 年 12 月 2 日 租户2 56 53 392 672 1400 371 636 1325
2023 年 12 月 3 日 租户3 124 94 868 1488 3100 658 1128 2350
2023年4月12日 租户4 176 82 1232 2112 4400 574 984 2050
snowflake-cloud-data-platform calculated-columns
1个回答
0
投票

因此,如果我们从一些假数据开始,而你 SQL:

with data(created, tenant, comment, simno) as (
    select *
    from values
     ('2023-12-01'::timestamp, 't1', 'R', 1),
     ('2023-12-01'::timestamp, 't1', 'R', 1),
     ('2023-12-01'::timestamp, 't1', 'R', 2),
     ('2023-12-01'::timestamp, 't1', 'C', 3),
     ('2023-12-01'::timestamp, 't1', 'C', 3)
)
select 
    CREATED::date as CREATED_DATE,
    TENANT,
    count(distinct case when comment = 'R' then SIMNO end) as NUMBER_OF_REQUESTS,
    count(distinct case when comment = 'C' then SIMNO end) as NUMBER_OF_COLLECTIONS
from data
group by 1,2;

我们看到您可以使用更简单的日期转换,以截断日期,您“此代码有效”需要添加一个组。

但除此之外,这是一个好的开始。我将假设您不确定如何做(并且没有提及)的关键点是“simno”上的总和有一个不同的值,因此可能存在重复的值,并且再次假设, 1-7、1-12 和 1-25 周期您还需要不同的计数,因此聚合必须发生三次,或者我们需要使用BITMAPS

鉴于我从未使用过 BITMAPS,那就这样吧(它是大数据的最佳路径,因此更有趣)。

with data(created, tenant, comment, simno) as (
    select *
    from values
     ('2023-12-01'::timestamp, 't1', 'R', 1),
     ('2023-12-01'::timestamp, 't1', 'R', 1),
     ('2023-12-01'::timestamp, 't1', 'R', 2),
     ('2023-12-01'::timestamp, 't1', 'C', 3),
     ('2023-12-01'::timestamp, 't1', 'C', 3),

     ('2023-12-02'::timestamp, 't1', 'R', 3),
     ('2023-12-02'::timestamp, 't1', 'R', 4),
     ('2023-12-02'::timestamp, 't1', 'R', 5),
     ('2023-12-02'::timestamp, 't1', 'C', 7),
     ('2023-12-02'::timestamp, 't1', 'C', 1)
     
), enriched_data as (
    select *
        ,created::date as created_date
        ,comment = 'R' as is_request
        ,comment = 'C' as is_collection
    from data 
), simno_seq_map as (
    select 
        simno
        ,seq8() as seq
    from (
        select distinct simno
        from data
    )
), mapped_data as (
    select d.*
        ,seq
    from enriched_data as d
    join simno_seq_map as m
        on d.simno = m.simno
), daily_bitmaps as (
    select
        created_date,
        tenant,
        BITMAP_BUCKET_NUMBER(seq) as bit_bucket,
        BITMAP_CONSTRUCT_AGG(iff(is_request, BITMAP_BIT_POSITION(seq), null)) as req_bit_bmp,
        BITMAP_CONSTRUCT_AGG(iff(is_collection, BITMAP_BIT_POSITION(seq), null)) as coll_bit_bmp
    from mapped_data
    group by 1,2,3
)
select 
    created_date,
    tenant,
    sum(r_cnt) as num_requests,
    sum(c_cnt) as num_collections
from (
    select 
        created_date,
        tenant,
        BITMAP_COUNT(BITMAP_OR_AGG(req_bit_bmp)) as r_cnt,
        BITMAP_COUNT(BITMAP_OR_AGG(coll_bit_bmp)) as c_cnt
    from daily_bitmaps
    group by 1,2, bit_bucket
)
group by 1,2
order by 1,2;

好的,所以需要做很多工作才能到达同一个地方,但是现在我们有了每日位图,我们可以对这些行进行组合,并且得到更低的行结果。

添加第 1-7 天:

with data(created, tenant, comment, simno) as (
    select *
    from values
     ('2023-12-01'::timestamp, 't1', 'R', 1),
     ('2023-12-01'::timestamp, 't1', 'R', 1),
     ('2023-12-01'::timestamp, 't1', 'R', 2),
     ('2023-12-01'::timestamp, 't1', 'C', 3),
     ('2023-12-01'::timestamp, 't1', 'C', 3),

     ('2023-12-02'::timestamp, 't1', 'R', 3),
     ('2023-12-02'::timestamp, 't1', 'R', 4),
     ('2023-12-02'::timestamp, 't1', 'R', 5),
     ('2023-12-02'::timestamp, 't1', 'C', 7),
     ('2023-12-02'::timestamp, 't1', 'C', 1)
     
), enriched_data as (
    select *
        ,created::date as created_date
        ,comment = 'R' as is_request
        ,comment = 'C' as is_collection
    from data 
), simno_seq_map as (
    select 
        simno
        ,seq8() as seq
    from (
        select distinct simno
        from data
    )
), mapped_data as (
    select d.*
        ,seq
    from enriched_data as d
    join simno_seq_map as m
        on d.simno = m.simno
), daily_bitmaps as (
    select
        created_date,
        tenant,
        BITMAP_BUCKET_NUMBER(seq) as bit_bucket,
        BITMAP_CONSTRUCT_AGG(iff(is_request, BITMAP_BIT_POSITION(seq), null)) as req_bit_bmp,
        BITMAP_CONSTRUCT_AGG(iff(is_collection, BITMAP_BIT_POSITION(seq), null)) as coll_bit_bmp
    from mapped_data
    group by 1,2,3
), data_today as (
    select 
        created_date,
        tenant,
        sum(r_cnt) as num_requests,
        sum(c_cnt) as num_collections
    from (
        select 
            created_date,
            tenant,
            BITMAP_COUNT(BITMAP_OR_AGG(req_bit_bmp)) as r_cnt,
            BITMAP_COUNT(BITMAP_OR_AGG(coll_bit_bmp)) as c_cnt
        from daily_bitmaps
        group by 1,2, bit_bucket
    )
    group by 1,2
), data_1_7_win as (
    select db.*
        ,dateadd('day', w.v, db.created_date) as window_date
    from daily_bitmaps as db
    cross join (values (1),(2),(3),(4),(5),(6),(7)) as w(v)
), data_1_7 as (
    select 
        created_date,
        tenant,
        sum(r_1_7_cnt) as num_requests_1_7,
        sum(c_1_7_cnt) as num_collections_1_7
    from (
        select 
            d.created_date,
            d.tenant,
            d.bit_bucket,
            BITMAP_COUNT(BITMAP_OR_AGG(dw.req_bit_bmp)) as r_1_7_cnt,
            BITMAP_COUNT(BITMAP_OR_AGG(dw.coll_bit_bmp)) as c_1_7_cnt
        from daily_bitmaps as d
        join data_1_7_win as dw
            on d.tenant = dw.tenant
                and d.bit_bucket = dw.bit_bucket
                and d.created_date = dw.window_date
        group by 1,2, d.bit_bucket
    )
    group by 1,2
)
select 
    dt.*,
    d7.num_requests_1_7,
    d7.num_collections_1_7
from data_today as dt
left join data_1_7 as d7
    on dt.created_date = d7.created_date
        and dt.tenant = d7.tenant
order by 1,2
;

好吧,这是一个很大的提升,首先我们使用过去 1-7 天的存储,以生成更多行,但允许等连接,因此复杂但快速。然后我们对前几天求和,并将它们与结果绑定,因此要得到 1-12 和 1-25,您很可能希望重复

data_1_7_win
CTE,但使用 12 和 25 值,然后执行
data_1_7 
对于那些窗口值。

是的,这可以通过范围连接来完成,但是如果您有大量数据,那么这种方式的执行速度会比这种方式慢得多。

© www.soinside.com 2019 - 2024. All rights reserved.