我有下面的Impala查询,该查询被设置为对数据集中的设备数量进行计数,而不是前一天。我想在多个日期(过去一年的每一天)中运行此查询。
在Impala中有什么方法可以做到这一点?我知道没有循环功能,但是不确定是否有某种方法可以将日期数组传递给变量,以便查询可以在连续的日期上运行。谢谢!
SELECT COUNT(DISTINCT devices)
FROM request
WHERE devices NOT IN (
SELECT devices
FROM request
WHERE forwarded_dt = CAST((CAST('2020-03-17' as timestamp)) as BIGINT)*1000
)
AND forwarded_dt = CAST((CAST('2020-03-18' as timestamp)) as BIGINT)*1000;
您可以使用lag()
。我认为:
select day,
count(*) as num_devices_on_day,
sum(case when prev_day = day - interval '1' day then 0 else 1 end) as new_devices_on_day
from (select device, date_trunc('day', timestamp) as day,
lag(date_trunc('day', timestamp)) over (order by min(timestamp)) as prev_day
from requests
) r
group by day;