我在使用
row_number
创建序列时遇到问题,但仍然无法处理它。
我有桌子
公元前 | io | 约会 |
---|---|---|
1a | 11 | 2022-01-01 |
1a | 11 | 2022-01-02 |
1a | 12 | 2022-01-03 |
1a | 11 | 2022-01-04 |
当我使用由
row_number
和 bc
划分的简单 io
并由 date
排序时,我得到了这个结果
公元前 | io | 约会 | rn |
---|---|---|---|
1a | 11 | 2022-01-01 | 1 |
1a | 11 | 2022-01-02 | 2 |
1a | 12 | 2022-01-03 | 1 |
1a | 11 | 2022-01-04 | 3 |
但是我需要这个结果,当
io
变化时,下一个io
,已经遇到过,应该从1开始
公元前 | io | 约会 | rn |
---|---|---|---|
1a | 11 | 2022-01-01 | 1 |
1a | 11 | 2022-01-02 | 2 |
1a | 12 | 2022-01-03 | 1 |
1a | 11 | 2022-01-04 | 1 |
我试过用这个sql,但是不正确
select tt.*,row_number() over(partition by tt.bc,tt.io order by tt.date ) as rn
from (
select '1a' as bc, 11 as io, '2021-01-01' as date
union all
select '1a' as bc, 11 as io, '2021-01-02' as date
union all
select '1a' as bc, 12 as io, '2021-01-03' as date
union all
select '1a' as bc, 11 as io, '2021-01-04' as date
) as tt
这是一个常见的间隙和孤岛问题:将每个键的连续属性值分组(给定一些“类似时间”的维度)。方法是这样的:
row_number
。row_number
每个键和按时间维度排序的感兴趣的属性。row_number
在某些属性发生变化时重置为1并且差异增加)。以下是查询:
with src as (
select inline(array(
struct('1a', 11, date '2022-01-01'),
struct('1a', 11, date '2022-01-02'),
struct('1a', 12, date '2022-01-03'),
struct('1a', 11, date '2022-01-04')
)) as (bc, io, dt)
)
, prepared as (
select
src.*
/*Partition by keys*/
, row_number() over(partition by bc order by dt asc)
/*Partition by keys AND attributes to track changes and create groups*/
- row_number() over(partition by bc, io order by dt asc) as rn_diff
from src
)
select
bc, io, dt
/*Partition by keys AND attributes to track changes AND group number*/
, row_number() over(partition by bc, io, rn_diff order by dt asc) as rn
from prepared
order by dt asc
公元前 | io | dt | rn |
---|---|---|---|
1a | 11 | 2022-01-01 | 1 |
1a | 11 | 2022-01-02 | 2 |
1a | 12 | 2022-01-03 | 1 |
1a | 11 | 2022-01-04 | 1 |
dbfiddle 基于 Postgres(添加了更多属性)。
您可以手动将行与“先前”行进行比较,并基于它创建变化指标。稍后对该指标求和将为您提供分区号,识别不间断的块。
这个分区号可以用在你的分区子句中
row_number
.
with tt as (
select '1a' as bc, 11 as io, '2021-01-01' as date
union all
select '1a' as bc, 11 as io, '2021-01-02' as date
union all
select '1a' as bc, 12 as io, '2021-01-03' as date
union all
select '1a' as bc, 11 as io, '2021-01-04' as date
), t2 as(
select
tt.*,
case when bc = lag(bc) over (order by date) and io = lag(io) over (order by date) then 0 else 1 end ind
from tt
), t3 as (
select
t2.*,
sum(ind) over ( order by date) pid
from t2
)
select
bc,
io,
date,
row_number() over (partition by pid order by date) rn
from t3
演示(在 MySQL 中)这里.
编辑:忽略
bc
的变化忽略部分案例条件提及bc
:
with tt as (
select '1a' as bc, 11 as io, '2021-01-01' as date
union all
select '1b' as bc, 11 as io, '2021-01-02' as date
union all
select '1a' as bc, 12 as io, '2021-01-03' as date
union all
select '1a' as bc, 11 as io, '2021-01-04' as date
), t2 as(
select
tt.*,
case when io = lag(io) over (order by date) then 0 else 1 end ind
from tt
), t3 as (
select
t2.*,
sum(ind) over ( order by date) pid
from t2
)
select
bc,
io,
date,
row_number() over (partition by pid order by date) rn
from t3
MySQL 中的演示在这里.