SQL 行号直到列值变回原样

问题描述 投票:0回答:2

我正在尝试创建一个列,表示每个产品最近的促销活动是在多少天前开始的。它应该继续依赖促销结束的时间,直到下一次促销开始。

例如,我想要以下内容:

产品 日期 in_促销 自上次促销以来的天数
1 2023-10-01
1 2023-10-02
1 2023-10-03 真实 0
1 2023-10-04 真实 1
1 2023-10-05 真实 2
1 2023-10-06 3
1 2023-10-07 4
1 2023-10-08 真实 0
1 2023-10-09 真实 1
1 2023-10-10 2

特别是,我很难找到正确的

days_since_last_promo
这些行:

产品 日期 in_促销 自上次促销以来的天数
1 2023-10-06 3
1 2023-10-07 4

我一直对滞后、row_number() 和分区感到困惑,但我无法弄清楚。这在 SQL 中可能吗?

我想说它与这篇文章有关,但我们正在尝试实现一些略有不同的东西。

我尝试过例如

select 
  product
  , date
  , in_promo
  , row_number() over (partition by recipe_id, in_promo, seqnum_u - seqnum_uo
                      order by date_cet
    ) as days_since_last_promo
from (select p.*,
             row_number() over (partition by product order by date) as seqnum_u,
             row_number() over (partition by product, in_promo order by date) as seqnum_uo
      from product_sales_data as p
      )

但这会给我

产品 日期 in_促销 自上次促销以来的天数
1 2023-10-01 1
1 2023-10-02 2
1 2023-10-03 真实 1
1 2023-10-04 真实 2
1 2023-10-05 真实 3
1 2023-10-06 1
1 2023-10-07 2
1 2023-10-08 真实 1
1 2023-10-09 真实 2
1 2023-10-10 1

即当

in_promo=false
.

时重新启动 row_number
sql google-bigquery row-number
2个回答
0
投票

这里是使用 ORACLE 语法但具有标准分析函数的解决方案,假设促销的开始是一系列行的 in_promo 的第一个日期 ( in_promo* !in_promo+ )(使用 MATCH_RECOGNIZE 应该更容易,但仅限 ORACLE):

with data(product, dat, in_promo) as (
    select 1, date '2023-10-01', 'false' from dual union all
    select 1, date '2023-10-02', 'false' from dual union all
    select 1, date '2023-10-03', 'true' from dual union all
    select 1, date '2023-10-04', 'true' from dual union all
    select 1, date '2023-10-05', 'true' from dual union all
    select 1, date '2023-10-06', 'false' from dual union all
    select 1, date '2023-10-07', 'false' from dual union all
    select 1, date '2023-10-08', 'true' from dual union all
    select 1, date '2023-10-09', 'true' from dual union all
    select 1, date '2023-10-10', 'false' from dual
)
select d.product, d.dat, 
    sum(ndays) over(partition by product, grp order by dat) as days_since_last_promo
from (
    select d.*,
        case when in_promo = 0 and grp = 0 then null
        else
            nvl(
                dat - last_value(dat) over(partition by product, grp order by dat 
                    rows between unbounded preceding and 1 preceding),
                0
            )
        end
        as ndays
    from (
        select d.*, 
            sum(change) over(partition by product order by dat) as grp
        from (
            select d.*, 
                decode(in_promo,1,
                    decode(1,lag(in_promo) over(partition by product order by dat),0,1),
                    0
                ) as change
            from (select product, dat, decode(in_promo,'true',1,0) as in_promo from data) d
        ) d
    ) d
) d
order by dat
;


1   01/10/2023 00:00:00 
1   02/10/2023 00:00:00 
1   03/10/2023 00:00:00 0
1   04/10/2023 00:00:00 1
1   05/10/2023 00:00:00 2
1   06/10/2023 00:00:00 3
1   07/10/2023 00:00:00 4
1   08/10/2023 00:00:00 0
1   09/10/2023 00:00:00 1
1   10/10/2023 00:00:00 2

0
投票

不知道为什么你的问题被否决了。拥有清晰的样本数据和诚实的解决方案尝试。

总体思路是,首先获取行号序列并标记每个记录,该记录要么源自 true(在促销中),要么从 false 切换到 true,这是促销的第一次新出现。

然后进行自连接以获取最新较早更改的 row_numbers 并取差值。调整1。

这是在 postgres 中,但我认为 bigquery 支持所有语法。

https://dbfiddle.uk/dYbyXMGP

create table some_sample_data 
  ( product integer,
    _date date,
    in_promo varchar(100)
  )

insert into some_sample_data values (1, '2023-10-01', 'false');
insert into some_sample_data values (1, '2023-10-02', 'false');
insert into some_sample_data values (1, '2023-10-03', 'true');
insert into some_sample_data values (1, '2023-10-04', 'true');
insert into some_sample_data values (1, '2023-10-05', 'true');
insert into some_sample_data values (1, '2023-10-06', 'false');
insert into some_sample_data values (1, '2023-10-07', 'false');
insert into some_sample_data values (1, '2023-10-08', 'true');
insert into some_sample_data values (1, '2023-10-09', 'true');
insert into some_sample_data values (1, '2023-10-10', 'false');
insert into some_sample_data values (2, '2023-10-01', 'true');
insert into some_sample_data values (2, '2023-10-02', 'false');
insert into some_sample_data values (2, '2023-10-03', 'false');
insert into some_sample_data values (2, '2023-10-04', 'true');

with sequenced_result as (
      SELECT 
        *,
        row_number() over ( partition by product order by cast(_date as date) asc) rn,
        case when in_promo = 'true'
              and coalesce(lag(in_promo) OVER(PARTITION BY product ORDER BY cast(_date as date) asc),'false') = 'false'
             then 1
             else 0
          end originated_as_or_change_to_true
    FROM some_sample_data
  ),
  earlier_snapshots as (
  select t1.*,
         max(t2.rn) as rn_of_last_change
    from sequenced_result t1
    left
    join sequenced_result t2
      on t1.product = t2.product
     and t1._date >= t2._date
     and t2.originated_as_or_change_to_true = 1
   group
      by t1.product,
         t1._date,
         t1.in_promo,
         t1.rn,
         t1.prev_value,
         t1.originated_as_or_change_to_true
  )
  select product,
         _date,
         in_promo,
         rn - rn_of_last_change + 1 as days_since_last_promo
    from earlier_snapshots
   order 
      by product, 
         _date
© www.soinside.com 2019 - 2024. All rights reserved.