样本数据
touristid|day
ABC|1
ABC|1
ABC|2
ABC|4
ABC|5
ABC|6
ABC|8
ABC|10
输出应为
touristid|trip
ABC|4
4后的逻辑是连续天数,连续天数sqq 1,1,2为第一,然后4,5,6为第二,然后8为第三,而10为第四我希望使用impala查询获得此输出
使用lag()函数获取前一天,如果day-prev_day> 1,则计算new_trip_flag,然后计数(new_trip_flag)。
演示:
with table1 as (
select 'ABC' as touristid, 1 as day union all
select 'ABC' as touristid, 1 as day union all
select 'ABC' as touristid, 2 as day union all
select 'ABC' as touristid, 4 as day union all
select 'ABC' as touristid, 5 as day union all
select 'ABC' as touristid, 6 as day union all
select 'ABC' as touristid, 8 as day union all
select 'ABC' as touristid, 10 as day
)
select touristid, count(new_trip_flag) trip_cnt
from
(
select touristid, day, prev_day,
case when (day-prev_day)>1 or prev_day is NULL then true end as new_trip_flag
from
(
select touristid, day,
lag(day) over(partition by touristid order by day) prev_day
from table1
)s
)s
group by touristid;
结果:
touristid trip_cnt
ABC 4