在分组中包含零计数

问题描述 投票:0回答:1

我想在每日、每周和每月的汇总中包含零计数。

我的日期表如下:

date_range AS (
  SELECT DATE_SUB(CURRENT_DATE(), INTERVAL OFFSET DAY) AS date
  FROM UNNEST(GENERATE_ARRAY(0, 27)) AS OFFSET
)

我的每日聚合代码是:

day_cte AS (
  SELECT
    date_range.date AS event_date,
    userbase.user_pseudo_id,
    COUNT(events.event_date) AS num_of_sessions
  FROM
    date_range
  CROSS JOIN
    userbase
  LEFT JOIN
    `app.analytics_317927526.events_*` AS events
  ON
    DATE(PARSE_DATE('%Y%m%d', events.event_date)) = date_range.date
    AND events.event_name = 'session_start'
    AND events.user_pseudo_id = userbase.user_pseudo_id
  GROUP BY
    date_range.date, userbase.user_pseudo_id
)

我的每周聚合代码是:

week_cte as 
(
select userbase.user_pseudo_id,DATE_TRUNC(date_range.date, week) as event_week ,count(*) as num_of_sessions


FROM
    date_range
  CROSS JOIN
    userbase
  LEFT JOIN
    `app.analytics_317927526.events_*` AS events
  ON
    DATE(PARSE_DATE('%Y%m%d', events.event_date)) = date_range.date
    AND events.event_name = 'session_start'
    AND events.user_pseudo_id = userbase.user_pseudo_id


group by 1,2
),

我的每月聚合代码是:

month_cte as 
(
select userbase.user_pseudo_id,DATE_TRUNC(date_range.date, month) as event_month ,count(*) as num_of_sessions


FROM
    date_range
  CROSS JOIN
    userbase
  LEFT JOIN
    `app.analytics_317927526.events_*` AS events
  ON
    DATE(PARSE_DATE('%Y%m%d', events.event_date)) = date_range.date
    AND events.event_name = 'session_start'
    AND events.user_pseudo_id = userbase.user_pseudo_id


group by 1,2
),


我只是想确认我这样做是正确的,因为每周和每月的汇总似乎会产生意想不到的结果。

每日结果似乎是合理的。

sql google-bigquery
1个回答
0
投票

您的每周和每月聚合代码未正确处理零计数。问题在于事件表的 LEFT JOIN。当特定日期没有事件时,LEFT JOIN 仍会为事件表中的列生成具有 NULL 值的行。但是,当您按周或月聚合时,您会截断日期,这会导致 date_range.date 的值与事件表匹配的值不同。

要在每周和每月聚合中正确处理零计数,您需要为每周和每月创建完整的日期范围,然后与事件表进行左连接。以下是调整每周和每月聚合查询的方法:

#sql

week_cte AS (
  SELECT
    userbase.user_pseudo_id,
    DATE_TRUNC(date_range.date, week) AS event_week,
    COUNT(events.event_date) AS num_of_sessions
  FROM
    (SELECT DATE_SUB(CURRENT_DATE(), INTERVAL OFFSET DAY) AS date FROM UNNEST(GENERATE_ARRAY(0, 27)) AS OFFSET) AS date_range
  CROSS JOIN
    userbase
  LEFT JOIN
    (SELECT DATE_TRUNC(PARSE_DATE('%Y%m%d', event_date), week) AS event_week, event_name, user_pseudo_id
     FROM `app.analytics_317927526.events_*`
     WHERE event_name = 'session_start') AS events
  ON
    events.event_week = DATE_TRUNC(date_range.date, week)
    AND events.user_pseudo_id = userbase.user_pseudo_id
  GROUP BY
    userbase.user_pseudo_id, event_week
),

month_cte AS (
  SELECT
    userbase.user_pseudo_id,
    DATE_TRUNC(date_range.date, month) AS event_month,
    COUNT(events.event_date) AS num_of_sessions
  FROM
    (SELECT DATE_SUB(CURRENT_DATE(), INTERVAL OFFSET DAY) AS date FROM UNNEST(GENERATE_ARRAY(0, 27)) AS OFFSET) AS date_range
  CROSS JOIN
    userbase
  LEFT JOIN
    (SELECT DATE_TRUNC(PARSE_DATE('%Y%m%d', event_date), month) AS event_month, event_name, user_pseudo_id
     FROM `app.analytics_317927526.events_*`
     WHERE event_name = 'session_start') AS events
  ON
    events.event_month = DATE_TRUNC(date_range.date, month)
    AND events.user_pseudo_id = userbase.user_pseudo_id
  GROUP BY
    userbase.user_pseudo_id, event_month
)

这些调整应确保您的每周和每月汇总正确包含零计数。

© www.soinside.com 2019 - 2024. All rights reserved.