如何在 Google BigQuery 中使用 SQL 从 events_* 和 events_intraday_* 表中进行选择

问题描述 投票:0回答:2

我是 BigQuery 和 SQL 的新手。我正在尝试从 Google 的 BigQuery 返回数据。我有一个查询可以从 events_* 或 events_intraday_* 形式的表中获取数据,但我还处于早期阶段,并且非常希望通过单个查询从两组表中提取所有数据。 看起来这应该是微不足道的,但我尝试过的都没有成功。我在任何文档中都找不到解释如何将 UNNEST 与 Google SQL 中的多个表结合起来的示例。

这是我原来的工作查询:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  `analytics_xxx.events_intraday_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

变化 1:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  `analytics_xxx.events_intraday_*`, `analytics_xxx.events_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

变体 1 错误:列名 event_params 在 [17:15] 处不明确

变化2:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  `analytics_356435236.events_intraday_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param,
  `analytics_356435236.events_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

变体 2 错误:列名 event_params 在 [19:15] 处不明确

变化3:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  (`analytics_xxx.events_intraday_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param),
  (`analytics_xxx.events_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param)
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

变体 3 错误:语法错误:预期关键字 JOIN 但在 [16:48] 得到“,” SQL

变化4:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  (`analytics_xxx.events_intraday_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param)
JOIN
  (`analytics_xxx.events_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param)
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

变体 4 错误:语法错误:预期关键字 JOIN 但在 [16:48] 处得到“,”

...等等。我尝试了十几种其他变体,但没有任何效果。非常感谢您的帮助!

sql google-bigquery
2个回答
1
投票

为了看看会发生什么,我询问了 ChatGPT,它给了我以下似乎运行完美的代码:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM (
  SELECT * FROM `analytics_xxx.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  
  UNION ALL
  
  SELECT * FROM `analytics_xxx.events_intraday_*`
  WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
), UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param
WHERE
  event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

我之前尝试过“UNION ALL”,但关键的变化似乎是将“WHERE”分成一个部分,用于按日期后缀选择表,以及一个用于选择记录参数的选项。


0
投票

@Jaytiger

我正在写一篇文章,因为我没有足够的代表来发表评论。仅当当天结束时,日内交易才会“合并”到事件表中。

因此,如果你想要当前的实时数据,就必须使用日内数据。

@OP。这是总体思路。 您不需要按月和年过滤您的日内数据。之前的盘内日期将被自动删除。但为了安全起见,请使用最新的。

select *
from
(
    (select * 
    from `analytics_fillyourid.events_*` 
    where _TABLE_SUFFIX BETWEEN 
        FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) 
        and FORMAT_DATE("%Y%m%d", CURRENT_DATE()))
    
    UNION ALL

    (select * 
    from `analytics_fillyourid.events_intraday_*`
    where _TABLE_SUFFIX = FORMAT_DATE("%Y%m%d", CURRENT_DATE()))
)

© www.soinside.com 2019 - 2024. All rights reserved.