我有两个表:事件和会话。
事件:
+-----------+---------------------+------+------------+
| event_id | timestamp | flag | session_id |
+-----------+---------------------+------+------------+
| kj123123j | 2020-01-01 22:51:11 | 0 | 1 |
| j24hjk234 | 2020-01-01 21:11:00 | 0 | 1 |
| kjh234khj | 2020-01-01 21:44:17 | 1 | 1 |
| 342hj24j3 | 2020-01-01 08:11:00 | 0 | 2 |
| kk1k12323 | 2020-01-01 13:55:12 | 1 | 2 |
| 890fd8sdf | 2020-01-01 20:55:14 | 0 | 2 |
+-----------+---------------------+------+------------+
会话:
+------------+---------+
| session_id | user_id |
+------------+---------+
| 1 | 12kk |
| 2 | 44qj |
+------------+---------+
我想要得到的是一个表,该表在发生标志之前统计每个用户的事件。
+---------+-------+
| user_id | count |
+---------+-------+
| 12kk | 1 |
| 44qj | 1 |
+---------+-------+
我尝试了两种方法:
2。
WITH
events AS (
SELECT
events.event_id,
events.timestamp,
events.user_id
FROM
db.events events
LEFT JOIN
db.users users
ON
events.session_id = users.session_id),
flags AS (
SELECT
events.event_id,
events.timestamp
FROM
db.events events
WHERE
events.flag is TRUE )
SELECT
events.user_id,
SUM(CASE
WHEN events.timestamp < flags.timestamp THEN 1
ELSE
0
END
)
FROM
flags
JOIN
events
ON
events.event_id = flags.event_id
GROUP BY
events.user_id
第二种方法的问题是count列仅为0,这绝对不可能。
我可以得到一些帮助来解决这个问题吗?
一种方法使用窗口函数和聚合:
select s.user_id, countif(timestamp < timestamp_1)
from sessions s join
(select e.*,
min(case when flag = 1 then timestamp end) over (partition by session_id) as timestamp_1
from events e
) e
on e.session_id = s.session_id
group by s.user_id;