我有一个(可能很愚蠢的)问题。我必须在 Snowflake 中创建一个视图,其中有日期列、租户、计数(number_of_requests)和计数(number_f_collections)。这部分我已经完成并且很满意。
基本代码如下:
select date_trunc(day, CREATED)::date as CREATED_DATE,
TENANT,
count(distinct case when comment = 'Non personalised order created' then SIMNO end) as NUMBER_OF_REQUESTS,
count(distinct case when comment = 'Non personalised order collected by customer' then SIMNO end) as NUMBER_OF_COLLECTIONS
from <DATABASE>.<SCHEMA>.<TABLE>;
第二部分是我正在努力解决的问题。在同一视图中,他们需要一个列,其中包含过去 1-7 天内每个租户的请求数、1-12 天以及 1-25 天发送的请求数。对于收藏来说也是如此。我什至不知道如何做到这一点。请查看所需的列名称。
任何帮助将非常感激。
我基本上只设法创建了基本代码。我尝试在日期、间隔 - 7 等之间使用。但都不起作用。
结果将类似于:
日期 租户 无_请求 无_收藏 请求_1_7天 请求_1_12天 请求_1_25天 收藏_1_7天 收藏_1_7天 收藏_1_7天 2023 年 12 月 1 日 租户1 65 46 455 780 1625 322 552 1150 2023 年 12 月 2 日 租户2 56 53 392 672 1400 371 636 1325 2023 年 12 月 3 日 租户3 124 94 868 1488 3100 658 1128 2350 2023年4月12日 租户4 176 82 1232 2112 4400 574 984 2050
因此,如果我们从一些假数据开始,而你 SQL:
with data(created, tenant, comment, simno) as (
select *
from values
('2023-12-01'::timestamp, 't1', 'R', 1),
('2023-12-01'::timestamp, 't1', 'R', 1),
('2023-12-01'::timestamp, 't1', 'R', 2),
('2023-12-01'::timestamp, 't1', 'C', 3),
('2023-12-01'::timestamp, 't1', 'C', 3)
)
select
CREATED::date as CREATED_DATE,
TENANT,
count(distinct case when comment = 'R' then SIMNO end) as NUMBER_OF_REQUESTS,
count(distinct case when comment = 'C' then SIMNO end) as NUMBER_OF_COLLECTIONS
from data
group by 1,2;
我们看到您可以使用更简单的日期转换,以截断日期,您“此代码有效”需要添加一个组。
但除此之外,这是一个好的开始。我将假设您不确定如何做(并且没有提及)的关键点是“simno”上的总和有一个不同的值,因此可能存在重复的值,并且再次假设, 1-7、1-12 和 1-25 周期您还需要不同的计数,因此聚合必须发生三次,或者我们需要使用BITMAPS。
鉴于我从未使用过 BITMAPS,那就这样吧(它是大数据的最佳路径,因此更有趣)。
with data(created, tenant, comment, simno) as (
select *
from values
('2023-12-01'::timestamp, 't1', 'R', 1),
('2023-12-01'::timestamp, 't1', 'R', 1),
('2023-12-01'::timestamp, 't1', 'R', 2),
('2023-12-01'::timestamp, 't1', 'C', 3),
('2023-12-01'::timestamp, 't1', 'C', 3),
('2023-12-02'::timestamp, 't1', 'R', 3),
('2023-12-02'::timestamp, 't1', 'R', 4),
('2023-12-02'::timestamp, 't1', 'R', 5),
('2023-12-02'::timestamp, 't1', 'C', 7),
('2023-12-02'::timestamp, 't1', 'C', 1)
), enriched_data as (
select *
,created::date as created_date
,comment = 'R' as is_request
,comment = 'C' as is_collection
from data
), simno_seq_map as (
select
simno
,seq8() as seq
from (
select distinct simno
from data
)
), mapped_data as (
select d.*
,seq
from enriched_data as d
join simno_seq_map as m
on d.simno = m.simno
), daily_bitmaps as (
select
created_date,
tenant,
BITMAP_BUCKET_NUMBER(seq) as bit_bucket,
BITMAP_CONSTRUCT_AGG(iff(is_request, BITMAP_BIT_POSITION(seq), null)) as req_bit_bmp,
BITMAP_CONSTRUCT_AGG(iff(is_collection, BITMAP_BIT_POSITION(seq), null)) as coll_bit_bmp
from mapped_data
group by 1,2,3
)
select
created_date,
tenant,
sum(r_cnt) as num_requests,
sum(c_cnt) as num_collections
from (
select
created_date,
tenant,
BITMAP_COUNT(BITMAP_OR_AGG(req_bit_bmp)) as r_cnt,
BITMAP_COUNT(BITMAP_OR_AGG(coll_bit_bmp)) as c_cnt
from daily_bitmaps
group by 1,2, bit_bucket
)
group by 1,2
order by 1,2;
好的,所以需要做很多工作才能到达同一个地方,但是现在我们有了每日位图,我们可以对这些行进行组合,并且得到更低的行结果。
with data(created, tenant, comment, simno) as (
select *
from values
('2023-12-01'::timestamp, 't1', 'R', 1),
('2023-12-01'::timestamp, 't1', 'R', 1),
('2023-12-01'::timestamp, 't1', 'R', 2),
('2023-12-01'::timestamp, 't1', 'C', 3),
('2023-12-01'::timestamp, 't1', 'C', 3),
('2023-12-02'::timestamp, 't1', 'R', 3),
('2023-12-02'::timestamp, 't1', 'R', 4),
('2023-12-02'::timestamp, 't1', 'R', 5),
('2023-12-02'::timestamp, 't1', 'C', 7),
('2023-12-02'::timestamp, 't1', 'C', 1)
), enriched_data as (
select *
,created::date as created_date
,comment = 'R' as is_request
,comment = 'C' as is_collection
from data
), simno_seq_map as (
select
simno
,seq8() as seq
from (
select distinct simno
from data
)
), mapped_data as (
select d.*
,seq
from enriched_data as d
join simno_seq_map as m
on d.simno = m.simno
), daily_bitmaps as (
select
created_date,
tenant,
BITMAP_BUCKET_NUMBER(seq) as bit_bucket,
BITMAP_CONSTRUCT_AGG(iff(is_request, BITMAP_BIT_POSITION(seq), null)) as req_bit_bmp,
BITMAP_CONSTRUCT_AGG(iff(is_collection, BITMAP_BIT_POSITION(seq), null)) as coll_bit_bmp
from mapped_data
group by 1,2,3
), data_today as (
select
created_date,
tenant,
sum(r_cnt) as num_requests,
sum(c_cnt) as num_collections
from (
select
created_date,
tenant,
BITMAP_COUNT(BITMAP_OR_AGG(req_bit_bmp)) as r_cnt,
BITMAP_COUNT(BITMAP_OR_AGG(coll_bit_bmp)) as c_cnt
from daily_bitmaps
group by 1,2, bit_bucket
)
group by 1,2
), data_1_7_win as (
select db.*
,dateadd('day', w.v, db.created_date) as window_date
from daily_bitmaps as db
cross join (values (1),(2),(3),(4),(5),(6),(7)) as w(v)
), data_1_7 as (
select
created_date,
tenant,
sum(r_1_7_cnt) as num_requests_1_7,
sum(c_1_7_cnt) as num_collections_1_7
from (
select
d.created_date,
d.tenant,
d.bit_bucket,
BITMAP_COUNT(BITMAP_OR_AGG(dw.req_bit_bmp)) as r_1_7_cnt,
BITMAP_COUNT(BITMAP_OR_AGG(dw.coll_bit_bmp)) as c_1_7_cnt
from daily_bitmaps as d
join data_1_7_win as dw
on d.tenant = dw.tenant
and d.bit_bucket = dw.bit_bucket
and d.created_date = dw.window_date
group by 1,2, d.bit_bucket
)
group by 1,2
)
select
dt.*,
d7.num_requests_1_7,
d7.num_collections_1_7
from data_today as dt
left join data_1_7 as d7
on dt.created_date = d7.created_date
and dt.tenant = d7.tenant
order by 1,2
;
好吧,这是一个很大的提升,首先我们使用过去 1-7 天的存储,以生成更多行,但允许等连接,因此复杂但快速。然后我们对前几天求和,并将它们与结果绑定,因此要得到 1-12 和 1-25,您很可能希望重复
data_1_7_win
CTE,但使用 12 和 25 值,然后执行 data_1_7
对于那些窗口值。
是的,这可以通过范围连接来完成,但是如果您有大量数据,那么这种方式的执行速度会比这种方式慢得多。