嘿,伙计们,也许有人有这方面的线索。我有一个表格,格式是这样的,我需要根据id和状态变化来汇总输出。
id timestamp status value
82240589 2020-03-01 09:13:46 70 22.00
82240589 2020-03-01 09:13:57 70 34.00
82240589 2020-03-01 09:14:14 70 21.00
82240589 2020-03-01 09:14:22 70 47.00
82240589 2020-03-01 09:14:33 70 32.00
82240589 2020-03-01 09:14:43 83 37.00
82240589 2020-03-01 09:14:52 83 44.00
82240589 2020-03-01 09:15:01 83 39.00
82240589 2020-03-01 09:15:10 70 40.00
82240589 2020-03-01 09:15:19 70 40.00
82240589 2020-03-01 09:16:30 70 5.00
82240589 2020-03-01 09:16:37 70 43.00
82240589 2020-03-01 09:16:46 70 46.00
82240589 2020-03-01 09:16:53 70 53.00
82240589 2020-03-01 09:17:00 70 55.00
82240589 2020-03-01 09:17:08 70 50.00
82240589 2020-03-01 09:17:16 70 46.00
82240589 2020-03-01 09:17:52 70 10.00
我需要根据ID和状态变化来汇总输出。此外,我还需要计算例如该期间所有值的总和。所以,比如说,输出结果是这样的。
id timestamp_start timestamp_end status sum_value
82240589 2020-03-01 09:13:46 2020-03-01 09:14:33 70 ####
82240589 2020-03-01 09:14:43 2020-03-01 09:15:01 83 ####
82240589 2020-03-01 09:15:10 2020-03-01 09:17:52 70 ####
这是一个 隙岛 的问题。
select id,
min("timestamp") as start_at,
max("timestamp") as end_at,
status,
sum(value)
from (
select id, "timestamp", status, value,
group_flag,
sum(group_flag) over (order by "timestamp") as group_nr
from (
select *,
case
when lag(status,1,status) over (partition by id order by "timestamp") = status then 0
else 1
end as group_flag
from data
order by id, "timestamp"
) t1
) t2
group by group_nr, status, id
order by id, start_at
所以,最里面的查询创建了一个标志,每当状态发生变化时,这个标志就会从0翻到1(对于同一个 id
值)。)
对于给定的数据,其结果是。
id | timestamp | status | value | group_flag
---------+---------------------+--------+-------+-----------
82240589 | 2020-03-01 09:13:46 | 70 | 22.00 | 0
82240589 | 2020-03-01 09:13:57 | 70 | 34.00 | 0
82240589 | 2020-03-01 09:14:14 | 70 | 21.00 | 0
82240589 | 2020-03-01 09:14:22 | 70 | 47.00 | 0
82240589 | 2020-03-01 09:14:33 | 70 | 32.00 | 0
82240589 | 2020-03-01 09:14:43 | 83 | 37.00 | 1
82240589 | 2020-03-01 09:14:52 | 83 | 44.00 | 0
82240589 | 2020-03-01 09:15:01 | 83 | 39.00 | 0
82240589 | 2020-03-01 09:15:10 | 70 | 40.00 | 1
82240589 | 2020-03-01 09:15:19 | 70 | 40.00 | 0
82240589 | 2020-03-01 09:16:30 | 70 | 5.00 | 0
82240589 | 2020-03-01 09:16:37 | 70 | 43.00 | 0
82240589 | 2020-03-01 09:16:46 | 70 | 46.00 | 0
82240589 | 2020-03-01 09:16:53 | 70 | 53.00 | 0
82240589 | 2020-03-01 09:17:00 | 70 | 55.00 | 0
82240589 | 2020-03-01 09:17:08 | 70 | 50.00 | 0
82240589 | 2020-03-01 09:17:16 | 70 | 46.00 | 0
82240589 | 2020-03-01 09:17:52 | 70 | 10.00 | 0
下一级根据该标志创建组。对于给定的数据,其结果是:。
id | timestamp | status | value | group_nr
---------+---------------------+--------+-------+---------
82240589 | 2020-03-01 09:13:46 | 70 | 22.00 | 0
82240589 | 2020-03-01 09:13:57 | 70 | 34.00 | 0
82240589 | 2020-03-01 09:14:14 | 70 | 21.00 | 0
82240589 | 2020-03-01 09:14:22 | 70 | 47.00 | 0
82240589 | 2020-03-01 09:14:33 | 70 | 32.00 | 0
82240589 | 2020-03-01 09:14:43 | 83 | 37.00 | 1
82240589 | 2020-03-01 09:14:52 | 83 | 44.00 | 1
82240589 | 2020-03-01 09:15:01 | 83 | 39.00 | 1
82240589 | 2020-03-01 09:15:10 | 70 | 40.00 | 2
82240589 | 2020-03-01 09:15:19 | 70 | 40.00 | 2
82240589 | 2020-03-01 09:16:30 | 70 | 5.00 | 2
82240589 | 2020-03-01 09:16:37 | 70 | 43.00 | 2
82240589 | 2020-03-01 09:16:46 | 70 | 46.00 | 2
82240589 | 2020-03-01 09:16:53 | 70 | 53.00 | 2
82240589 | 2020-03-01 09:17:00 | 70 | 55.00 | 2
82240589 | 2020-03-01 09:17:08 | 70 | 50.00 | 2
82240589 | 2020-03-01 09:17:16 | 70 | 46.00 | 2
82240589 | 2020-03-01 09:17:52 | 70 | 10.00 | 2
正如我们所看到的,不同的 "组 "在状态标志下产生的结果现在有了一个唯一的编号,这个编号可以用来进行分组聚合,然后在最外层的查询中完成。
查询的嵌套是必要的,因为你不能嵌套窗口函数调用。