我想计算每个代理的多个可能重叠的连接的会话间隔。我在 PostgreSQL 数据库中有一个这样的日志表:
id dt session_id agent_id state
7 2024-01-25 22:26:57 4 3148 0
7 2024-01-25 22:24:57 4 3148 1
6 2024-01-25 22:23:57 2 3148 0
5 2024-01-25 15:53:30 1 3148 0
4 2024-01-25 15:53:30 3 3148 0
3 2024-01-25 13:53:02 3 3148 1
2 2024-01-25 12:43:10 2 3148 1
1 2024-01-25 12:30:02 1 3148 1
对于单个连接,我使用
lag()
函数:
select agent_id, status, dt, lag(dt) over (partition by (agent_id) order by dt) as pre_status_dt from
(
select *,
lag(status) over (partition by (agent_id) order by dt) as pre_status
from (
select cs.dt, cs.user_id as agent_id, cs.status
from "channel_subscribe" cs
where cs.channel = 14
) as t1
where t1.dt >= '2024-01-01'
) as t2
where t2.status != t2.pre_status
order by dt asc
但这不适用于重叠连接。
我想得到这个结果:
agent_id start_dt end_dt
3148 2024-01-25 12:30:02 2024-01-25 22:23:57
3148 2024-01-25 22:24:57 2024-01-25 22:26:57
您的示例数据显示嵌套会话:附加会话仅在同一代理的第一个打开会话内开始和结束。或者,会话可能会被链接:下一个会话在第一个会话内开始,但稍后结束。
以下解决方案涵盖这两种情况:组合会话从第一个打开的会话开始,并在所有打开的会话关闭后结束。
每个都假设一致的数据:每个关闭的会话之前都已打开(或同时打开,但时间更早
id
)。未完成的会话将如此报告(end_dt
为空)。如果您的数据不太可靠,您需要定义可能出现问题的地方,并相应地处理案例。
SELECT agent_id
, min(dt) AS start_dt
, max(dt) FILTER (WHERE sum_state = 0) AS end_dt -- !
FROM (
SELECT *
, count(*) FILTER (WHERE sum_state = 0)
OVER (PARTITION BY agent_id ORDER BY dt DESC, id DESC) AS island
FROM (
SELECT *
, sum(CASE state WHEN 0 THEN -1 ELSE 1 END)
OVER (PARTITION BY agent_id ORDER BY dt, id) sum_state
FROM channel_subscribe
) sub1
) sub2
GROUP BY agent_id, island
ORDER BY agent_id, island DESC;
sub1
中的查询将每个代理的运行计数添加为sum_state
。当 sum_state
返回到 0 时,组合会话结束。
子查询
sub2
通过计算每个代理的“0”事件来形成组 (island
)。请注意降序排序 (DESC
) 以将每个“0”事件包含在其组中。
外部查询根据每个组合会话的请求报告数据。添加的
FILTER (WHERE sum_state = 0)
确保未完成的会话被如此报告。
一旦我们需要多个子查询级别(更改排序顺序),使用像这样的 PL/pgSQL 函数这样的(组合)过程解决方案可能会更快:
CREATE OR REPLACE FUNCTION f_combined_sessions()
RETURNS TABLE (agent_id int, start_dt timestamp, end_dt timestamp)
LANGUAGE plpgsql AS
$func$
DECLARE
r record;
_new_session bool;
_agent_id int;
BEGIN
FOR r IN
SELECT c.agent_id, c.dt
, sum(CASE c.state WHEN 0 THEN -1 ELSE 1 END)
OVER (PARTITION BY c.agent_id ORDER BY c.dt, c.id) AS sum_state
FROM channel_subscribe c
ORDER BY c.agent_id, c.dt, c.id
LOOP
IF agent_id = r.agent_id THEN -- same agent
IF _new_session = true THEN
start_dt := r.dt;
_new_session := false;
END IF;
ELSE -- new agent
agent_id := r.agent_id;
start_dt := r.dt;
_new_session := false;
END IF;
IF r.sum_state = 0 THEN -- end session
end_dt := r.dt;
RETURN NEXT;
_new_session := true;
END IF;
END LOOP;
-- return open session?
IF _new_session = false THEN
end_dt := null;
RETURN NEXT;
END IF;
END
$func$;
致电:
SELECT * FROM f_combined_sessions();
相关: