蜂巢滞后窗口分区

问题描述 投票:1回答:1

这是我的桌子:

sensor_name, ext_value, int_value, growth
47ACXVMACSENS01, 238, 157, 1
47ACXVMACSENS01, 157, 256, 2
47ACXVMACSENS01, 895, 345, 3
47ACXVMACSENS01, 79, 861, 3
91DKCVMACSENS02, 904, 858, 1
91DKCVMACSENS02, 925, 588, 1
91DKCVMACSENS02, 15, 738, 1
91DKCVMACSENS02, 77, 38, 2

前三列(sensor_name,ext_value,int_value)是给定的数据,第四列是我想要的计算列,此增长列基于每组的列值(ext_value,int_value) sensor_name。

增长列的结果计算如下:对于每组sensor_name,将每行的int_value与上一行的ext_value比较,如果没有上一行,则其ext_value为0,如果当前行的int_value高于上一行的ext_value,则增长值增加1。如果当前int_value小于上一行ext_value,则增长保持与上一个增长值相同的值。

在上面的示例中,

for the very first row, 157 is compared with the previous row ext_value that doesn't exist so it's 0, 
157 > 0 then growth value increase of 1 from 0.
on the 2nd row, 256 > 238 then growth = 1+1=2
on the 3rd row, 345 > 159 then growth = 2+1=3
on the 4th row, 861 < 895 then growth remains at the same previous value, so 3.

then the logic is re-applied to the second set of sensor_name :
1st row, 858 > 0 (because there is now previous row for this sensor_name) then growth = 1
2nd row, 588 < 904 then growth = 1
3rd row, 738 < 925 then growth = 1
4th row, 38 > 15 then growth = 1+1=2

我已经尝试过使用sensor_name分区上的滞后窗口,但直到现在它都没有给我正确的结果。

我该如何解决?

hive window lag partition
1个回答
0
投票

使用滞后获取先前的ext_value,计算增长标志,并使用运行计数来计算增长。如您在评论中所说,我添加了rcv_time列:

© www.soinside.com 2019 - 2024. All rights reserved.