获取重叠连接的组合间隔

问题描述 投票:0回答:1

我想计算每个代理的多个可能重叠的连接的会话间隔。我在 PostgreSQL 数据库中有一个这样的日志表:

id  dt                  session_id  agent_id state
7   2024-01-25 22:26:57 4           3148    0
7   2024-01-25 22:24:57 4           3148    1
6   2024-01-25 22:23:57 2           3148    0
5   2024-01-25 15:53:30 1           3148    0
4   2024-01-25 15:53:30 3           3148    0
3   2024-01-25 13:53:02 3           3148    1
2   2024-01-25 12:43:10 2           3148    1
1   2024-01-25 12:30:02 1           3148    1

对于单个连接,我使用

lag()
函数:

select agent_id, status, dt, lag(dt) over (partition by (agent_id) order by dt) as pre_status_dt  from 
(
    select *, 
      lag(status) over (partition by (agent_id) order by dt) as pre_status 
    from (
        select cs.dt, cs.user_id as agent_id, cs.status
        from "channel_subscribe" cs 
        where cs.channel = 14
    ) as t1
    where t1.dt >= '2024-01-01'
) as t2
where t2.status != t2.pre_status
order by dt asc

但这不适用于重叠连接。

我想得到这个结果:

agent_id  start_dt             end_dt
3148      2024-01-25 12:30:02  2024-01-25 22:23:57
3148      2024-01-25 22:24:57  2024-01-25 22:26:57
sql postgresql intervals gaps-and-islands
1个回答
0
投票

您的示例数据显示嵌套会话:附加会话仅在同一代理的第一个打开会话内开始和结束。或者,会话可能会被链接:下一个会话在第一个会话内开始,但稍后结束。

以下解决方案涵盖这两种情况:组合会话从第一个打开的会话开始,并在所有打开的会话关闭后结束。

每个都假设一致的数据:每个关闭的会话之前都已打开(或同时打开,但时间更早

id
)。未完成的会话将如此报告(
end_dt
为空)。如果您的数据不太可靠,您需要定义可能出现问题的地方,并相应地处理案例。

纯SQL

SELECT agent_id
     , min(dt) AS start_dt
     , max(dt) FILTER (WHERE sum_state = 0) AS end_dt  -- !
FROM  (
   SELECT *
        , count(*) FILTER (WHERE sum_state = 0)
          OVER (PARTITION BY agent_id ORDER BY dt DESC, id DESC) AS island
   FROM  (
      SELECT *
           , sum(CASE state WHEN 0 THEN -1 ELSE 1 END)
             OVER (PARTITION BY agent_id ORDER BY dt, id) sum_state
      FROM   channel_subscribe
      ) sub1
   ) sub2
GROUP  BY agent_id, island
ORDER  BY agent_id, island DESC;

sub1
中的查询将每个代理的运行计数添加为
sum_state
。当
sum_state
返回到 0 时,组合会话结束。

子查询

sub2
通过计算每个代理的“0”事件来形成组 (
island
)。请注意降序排序 (
DESC
) 以将每个“0”事件包含在其组中。

外部查询根据每个组合会话的请求报告数据。添加的

FILTER (WHERE sum_state = 0)
确保未完成的会话被如此报告。

程序解决方案

一旦我们需要多个子查询级别(更改排序顺序),使用像这样的 PL/pgSQL 函数这样的(组合)过程解决方案可能会更快:

CREATE OR REPLACE FUNCTION f_combined_sessions()
  RETURNS TABLE (agent_id int, start_dt timestamp, end_dt timestamp)
  LANGUAGE plpgsql AS
$func$
DECLARE
   r            record;
   _new_session bool;
   _agent_id    int;
BEGIN
   FOR r IN
      SELECT c.agent_id, c.dt
           , sum(CASE c.state WHEN 0 THEN -1 ELSE 1 END)
             OVER (PARTITION BY c.agent_id ORDER BY c.dt, c.id) AS sum_state
      FROM   channel_subscribe c
      ORDER  BY c.agent_id,  c.dt, c.id
   LOOP
      IF agent_id = r.agent_id THEN  -- same agent
         IF _new_session = true THEN
            start_dt     := r.dt;
            _new_session := false;
         END IF;
      ELSE                           -- new agent
         agent_id     := r.agent_id;
         start_dt     := r.dt;
         _new_session := false;
      END IF;
      
      IF r.sum_state = 0 THEN        -- end session
         end_dt := r.dt;
         RETURN NEXT;
         _new_session := true;
      END IF;
   END LOOP;

   -- return open session?
   IF _new_session = false THEN
      end_dt := null;
      RETURN NEXT;
   END IF;
END
$func$;

致电:

SELECT * FROM f_combined_sessions();

小提琴


相关:

© www.soinside.com 2019 - 2024. All rights reserved.