Hive查询中的前滚平均值

问题描述 投票:0回答:1

我想基于“ 4天窗口”来计算滚动平均值。请在下面找到详细信息

Create table stock(day int, time String, cost float);

Insert into stock values(1,"8 AM",3.1);
Insert into stock values(1,"9 AM",3.2);
Insert into stock values(1,"10 AM",4.5);
Insert into stock values(1,"11 AM",5.5);
Insert into stock values(2,"8 AM",5.1);
Insert into stock values(2,"9 AM",2.2);
Insert into stock values(2,"10 AM",1.5);
Insert into stock values(2,"11 AM",6.5);
Insert into stock values(3,"8 AM",8.1);
Insert into stock values(3,"9 AM",3.2);
Insert into stock values(3,"10 AM",2.5);
Insert into stock values(3,"11 AM",4.5);
Insert into stock values(4,"8 AM",3.1);
Insert into stock values(4,"9 AM",1.2);
Insert into stock values(4,"10 AM",0.5);
Insert into stock values(4,"11 AM",1.5); 

我写了下面的查询

select day, cost,sum(cost) over (order by day range between current row and 4 Following), avg(cost) over (order by day range between current row and 4 Following) 
from stock

如您所见,我每天获得4条记录,我需要在4天窗口中计算滚动平均值。为此,我编写了上面的窗口查询,因为我每天只有4天的数据包含4条记录,所以第一天的总和就是所有16条记录的总和。基于此,第一条记录的总和为56.20,这是正确的,并且平均值应该为56.20 / 4(因为有4天),但是它的结果为56.20 / 16,因为总共有16条记录。我该如何解决这个问题的平均部分?

谢谢拉吉

sql hive bigdata hql
1个回答
0
投票

这是您想要的吗?

select t.*,
       avg(cost) over (order by day range between current row and 4 following)
from t;

编辑:

您似乎想要的是:

select t.*,
       (sum(cost) over (order by day range between current row and 3 following) /
        count(distinct day) over (order by day range between current row and 3 following)
       )
from t;

但是,Hive不支持此功能。您可以为此目的使用子查询:

select t.*,
       (sum(cost) over (order by day range between current row and 3 following) /
        sum(case when seqnum = 1 then 1 else 0 end) over (order by day range between current row and 3 following)
       )
from (select t.*
             row_number() over (partition by day order by time) as seqnum
      from t
     )t
© www.soinside.com 2019 - 2024. All rights reserved.