ROW_NUMBER 或其他序列取决于日期 (SQL)

问题描述 投票:0回答:2

我在使用

row_number
创建序列时遇到问题,但仍然无法处理它。

我有桌子

公元前 io 约会
1a 11 2022-01-01
1a 11 2022-01-02
1a 12 2022-01-03
1a 11 2022-01-04

当我使用由

row_number
bc
划分的简单
io
并由
date
排序时,我得到了这个结果

公元前 io 约会 rn
1a 11 2022-01-01 1
1a 11 2022-01-02 2
1a 12 2022-01-03 1
1a 11 2022-01-04 3

但是我需要这个结果,当

io
变化时,下一个
io
,已经遇到过,应该从1

开始
公元前 io 约会 rn
1a 11 2022-01-01 1
1a 11 2022-01-02 2
1a 12 2022-01-03 1
1a 11 2022-01-04 1

我试过用这个sql,但是不正确

select tt.*,row_number() over(partition by tt.bc,tt.io order by tt.date ) as rn
from (
    select '1a' as bc, 11 as io, '2021-01-01' as date
    union all
    select '1a' as bc, 11 as io, '2021-01-02' as date
    union all
    select '1a' as bc, 12 as io, '2021-01-03' as date
    union all
    select '1a' as bc, 11 as io, '2021-01-04' as date
) as tt
sql hive sequence gaps-and-islands row-number
2个回答
0
投票

这是一个常见的间隙和孤岛问题:将每个键的连续属性值分组(给定一些“类似时间”的维度)。方法是这样的:

  • 计算按时间维度排序的每个键
    row_number
  • 计算
    row_number
    每个键和按时间维度排序的感兴趣的属性。
  • 找到它们的差异 - 这会将连续的相同属性值分组(第二个
    row_number
    在某些属性发生变化时重置为1并且差异增加)。

以下是查询:

with src as (
    select inline(array(
      struct('1a', 11, date '2022-01-01'),
      struct('1a', 11, date '2022-01-02'),
      struct('1a', 12, date '2022-01-03'),
      struct('1a', 11, date '2022-01-04')
    )) as (bc, io, dt)
)
, prepared as (
  select
    src.*
    /*Partition by keys*/
    , row_number() over(partition by bc order by dt asc)
        /*Partition by keys AND attributes to track changes and create groups*/
      - row_number() over(partition by bc, io order by dt asc) as rn_diff
  from src
)
select
  bc, io, dt
  /*Partition by keys AND attributes to track changes AND group number*/
  , row_number() over(partition by bc, io, rn_diff order by dt asc) as rn
from prepared
order by dt asc
公元前 io dt rn
1a 11 2022-01-01 1
1a 11 2022-01-02 2
1a 12 2022-01-03 1
1a 11 2022-01-04 1

dbfiddle 基于 Postgres(添加了更多属性)。


0
投票

您可以手动将行与“先前”行进行比较,并基于它创建变化指标。稍后对该指标求和将为您提供分区号,识别不间断的块。

这个分区号可以用在你的分区子句中

row_number
.

with tt as (
    select '1a' as bc, 11 as io, '2021-01-01' as date
    union all
    select '1a' as bc, 11 as io, '2021-01-02' as date
    union all
    select '1a' as bc, 12 as io, '2021-01-03' as date
    union all
    select '1a' as bc, 11 as io, '2021-01-04' as date
), t2 as(
  select 
    tt.*,
    case when bc = lag(bc) over (order by date) and io = lag(io) over (order by date) then 0 else 1 end ind
  from tt
), t3 as (
  select 
    t2.*, 
    sum(ind) over ( order by date) pid 
  from t2
)
select 
  bc,
  io, 
  date, 
  row_number() over (partition by pid order by date) rn
from t3

演示(在 MySQL 中)这里.


编辑:忽略

bc
的变化忽略部分案例条件提及
bc

with tt as (
    select '1a' as bc, 11 as io, '2021-01-01' as date
    union all
    select '1b' as bc, 11 as io, '2021-01-02' as date
    union all
    select '1a' as bc, 12 as io, '2021-01-03' as date
    union all
    select '1a' as bc, 11 as io, '2021-01-04' as date
), t2 as(
  select 
    tt.*,
    case when io = lag(io) over (order by date) then 0 else 1 end ind
  from tt
), t3 as (
  select 
    t2.*, 
    sum(ind) over ( order by date) pid 
  from t2
)
select 
  bc,
  io, 
  date, 
  row_number() over (partition by pid order by date) rn
from t3

MySQL 中的演示在这里.

© www.soinside.com 2019 - 2024. All rights reserved.