仅在前面的行中的一个值小于另一个时才聚合前面的行

问题描述 投票:0回答:1

我处于每台机器都有两个计数的情况,pr

p 应该始终大于或等于r,但是由于技术滞后和较短的汇总周期,情况并非总是如此,r计数经常-但并非总是-显示前一时期的数据。由于滞后时间的长度不是恒定的,因此无法确切知道r值属于哪个周期。因此,我不能简单地将所有r计数在时间上均匀地向后移动,因为这可能会产生以前没有的其他差异。

这种情况无法更改,我必须按原样处理数据。

[在下面的示例中,您可以看到p在机器1上短暂计数为“暂停”,而在机器2上显着减慢,但是r计数继续返回的值大于p短暂停留,然后也“暂停”:

-- Dummy data
declare @t table(d date,m int,p int,r int);
insert into @t values(getdate()-9,1,100,10),(getdate()-8,1,90 ,10),(getdate()-7,1,70 ,10),(getdate()-6,1,70 ,10),(getdate()-5,1,80 ,10),(getdate()-4,1,50 ,10),(getdate()-3,1,10 ,10),(getdate()-2,1,0  ,10),(getdate()-1,1,0  ,10),(getdate()+0,1,0  ,10),(getdate()+1,1,0  ,0),(getdate()+2,1,0  ,0),(getdate()+3,1,40 ,0),(getdate()+4,1,50 ,0),(getdate()+5,1,80 ,10),(getdate()-9,2,1100,100),(getdate()-8,2,190 ,100),(getdate()-7,2,170 ,100),(getdate()-6,2,170 ,100),(getdate()-5,2,180 ,100),(getdate()-4,2,150 ,100),(getdate()-3,2,110 ,100),(getdate()-2,2,10  ,100),(getdate()-1,2,10  ,100),(getdate()+0,2,10  ,100),(getdate()+1,2,10  ,0),(getdate()+2,2,10  ,0),(getdate()+3,2,140 ,0),(getdate()+4,2,150 ,0),(getdate()+5,2,180 ,100);
select * from @t order by m,d;

-- Output
+------------+---+------+-----+
|     d      | m |  p   |  r  |
+------------+---+------+-----+
| 2020-05-27 | 1 |  100 |  10 |
| 2020-05-28 | 1 |   90 |  10 |
| 2020-05-29 | 1 |   70 |  10 |
| 2020-05-30 | 1 |   70 |  10 |
| 2020-05-31 | 1 |   80 |  10 |
| 2020-06-01 | 1 |   50 |  10 |
| 2020-06-02 | 1 |   10 |  10 |
| 2020-06-03 | 1 |    0 |  10 |
| 2020-06-04 | 1 |    0 |  10 |
| 2020-06-05 | 1 |    0 |  10 |
| 2020-06-06 | 1 |    0 |   0 |
| 2020-06-07 | 1 |    0 |   0 |
| 2020-06-08 | 1 |   40 |   0 |
| 2020-06-09 | 1 |   50 |   0 |
| 2020-06-10 | 1 |   80 |  10 |
| 2020-05-27 | 2 | 1100 | 100 |
| 2020-05-28 | 2 |  190 | 100 |
| 2020-05-29 | 2 |  170 | 100 |
| 2020-05-30 | 2 |  170 | 100 |
| 2020-05-31 | 2 |  180 | 100 |
| 2020-06-01 | 2 |  150 | 100 |
| 2020-06-02 | 2 |  110 | 100 |
| 2020-06-03 | 2 |   10 | 100 |
| 2020-06-04 | 2 |   10 | 100 |
| 2020-06-05 | 2 |   10 | 100 |
| 2020-06-06 | 2 |   10 |   0 |
| 2020-06-07 | 2 |   10 |   0 |
| 2020-06-08 | 2 |  140 |   0 |
| 2020-06-09 | 2 |  150 |   0 |
| 2020-06-10 | 2 |  180 | 100 |
+------------+---+------+-----+

我需要能够在一定程度上合理地向后调整那些r计数,以便将它们以使每个p数大于或等于相应r值的方式添加到先前的行中。

在上面的m = 1例子中,输出看起来像以下r个计数的any一样;我不在乎调整的范围,仅在每一行都使用p>=r

+------------+---+------+------+------+------+
|     d      | m |  p   |  r1  |  r2  |  r3  |
+------------+---+------+------+------+------+
| 2020-05-27 | 1 |  100 |   10 |   10 |   10 |
| 2020-05-28 | 1 |   90 |   10 |   10 |   10 |
| 2020-05-29 | 1 |   70 |   10 |   15 |   10 |
| 2020-05-30 | 1 |   70 |   20 |   20 |   10 |) Note how the original 30 r counts
| 2020-05-31 | 1 |   80 |   20 |   20 |   10 |} that didn't follow the rule
| 2020-06-01 | 1 |   50 |   20 |   15 |   40 |) have been moved back in time
| 2020-06-02 | 1 |   10 |   10 |   10 |   10 |
| 2020-06-03 | 1 |    0 |    0 |    0 |    0 |
| 2020-06-04 | 1 |    0 |    0 |    0 |    0 |
| 2020-06-05 | 1 |    0 |    0 |    0 |    0 |
| 2020-06-06 | 1 |    0 |    0 |    0 |    0 |
| 2020-06-07 | 1 |    0 |    0 |    0 |    0 |
| 2020-06-08 | 1 |   40 |    0 |    0 |    0 |
| 2020-06-09 | 1 |   50 |    0 |    0 |    0 |
| 2020-06-10 | 1 |   80 |   10 |   10 |   10 |
+------------+---+------+------+------+------+

我已经尝试使用窗口函数和rows between等解决此问题,但是我不知道如何确定需要重新分配给先前期间的r值,以及确定哪个p分配给它们的值。如果我取得了任何进展,我将在下面添加它,但是非常感谢所有帮助。


尝试1

我管理的最接近的是以下适用于上面的方法,但是当您将p = 50的值更改为小于40的值并且在我只想向后调整时间的同时向前和向后调整时,都会失败:] >

with t as(
select row_number() over (partition by m order by d) as rn
      ,(row_number() over (partition by m order by d)-1) / 5 as gn
      ,*
from @t
where m = 1
)
select *
      ,case when p > r
            then r + (sum(case when p < r then r else 0 end) over (partition by gn) / sum(case when p > r then 1 else 0 end) over (partition by gn))
            else case when p = r
                      then r
                      else 0
                      end
            end as r_adj
from t;

尝试2

距离更近,但仍在向前和向后调整:

with t as(
select row_number() over (partition by m order by d) as rn
      ,(row_number() over (partition by m order by d)-1) / 10 as gn
      ,(row_number() over (partition by m order by d)+4) / 10 as gn2
      ,*
from @t
where m = 1
)
,r1 as(
select *
      ,case when p > r
            then r + (sum(case when p < r then r - p else 0 end) over (partition by gn) / sum(case when p > r then 1. else 0. end) over (partition by gn))
            else case when p = r
                      then r
                      else 0
                      end
            end as r_adj
from t
)
select d
      ,m
      ,p
      ,r
      ,case when p > r_adj
            then r_adj + (sum(case when p < r_adj then r_adj - p else 0 end) over (partition by gn2) / sum(case when p > r_adj then 1. else 0. end) over (partition by gn2))
            else case when p = r_adj
                      then r_adj
                      else r_adj - (r_adj - p)
                      end
            end as r_new
from r1
order by rn
;
    

我处于每台机器都有两个计数p和r的情况。 p应该始终大于或等于r,但是由于技术滞后和较短的汇总周期,这并不总是... ...>

sql sql-server time-series sql-server-2016 window-functions
1个回答
0
投票

一种方法使用apply

select t.*,
       t2.r as imputed_r
from t outer apply
     (select top (1) t2.*
      from t t2
      where t2.m = t.m and
            t2.d >= t.d and t2.r <= t.p
      order by t2.d desc
     ) t2;
© www.soinside.com 2019 - 2024. All rights reserved.