对同一表中固定行之前具有时间戳的行进行计数,固定行中具有特定值

问题描述 投票:0回答:1

我有两个表:事件和会话。

事件:

+-----------+---------------------+------+------------+
| event_id  |      timestamp      | flag | session_id |
+-----------+---------------------+------+------------+
| kj123123j | 2020-01-01 22:51:11 |    0 |          1 |
| j24hjk234 | 2020-01-01 21:11:00 |    0 |          1 |
| kjh234khj | 2020-01-01 21:44:17 |    1 |          1 |
| 342hj24j3 | 2020-01-01 08:11:00 |    0 |          2 |
| kk1k12323 | 2020-01-01 13:55:12 |    1 |          2 |
| 890fd8sdf | 2020-01-01 20:55:14 |    0 |          2 |
+-----------+---------------------+------+------------+

会话:

+------------+---------+
| session_id | user_id |
+------------+---------+
|          1 | 12kk    |
|          2 | 44qj    |
+------------+---------+

我想要得到的是一个表,该表在发生标志之前统计每个用户的事件。

+---------+-------+
| user_id | count |
+---------+-------+
| 12kk    |     1 |
| 44qj    |     1 |
+---------+-------+

我尝试了两种方法:

  1. 自我加入表,我无法测试,因为它非常慢(事件表很大。)

2。

WITH
  events AS (
  SELECT
    events.event_id,
    events.timestamp,
    events.user_id
  FROM
    db.events events
  LEFT JOIN
    db.users users
  ON
    events.session_id = users.session_id),
  flags AS (
  SELECT
    events.event_id,
    events.timestamp
  FROM
    db.events events
  WHERE
   events.flag is TRUE )
SELECT
  events.user_id,
  SUM(CASE
      WHEN events.timestamp < flags.timestamp THEN 1
    ELSE
    0
  END
    )
FROM
  flags
JOIN
  events
ON
  events.event_id = flags.event_id
GROUP BY
  events.user_id

第二种方法的问题是count列仅为0,这绝对不可能。

我可以得到一些帮助来解决这个问题吗?

sql google-bigquery
1个回答
0
投票

一种方法使用窗口函数和聚合:

select s.user_id, countif(timestamp < timestamp_1)
from sessions s join
     (select e.*,
             min(case when flag = 1 then timestamp end) over (partition by session_id) as timestamp_1
      from events e
     ) e
     on e.session_id = s.session_id
group by s.user_id;
© www.soinside.com 2019 - 2024. All rights reserved.