我是 BigQuery 和 SQL 的新手。我正在尝试从 Google 的 BigQuery 返回数据。我有一个查询可以从 events_* 或 events_intraday_* 形式的表中获取数据,但我还处于早期阶段,并且非常希望通过单个查询从两组表中提取所有数据。 看起来这应该是微不足道的,但我尝试过的都没有成功。我在任何文档中都找不到解释如何将 UNNEST 与 Google SQL 中的多个表结合起来的示例。
这是我原来的工作查询:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
`analytics_xxx.events_intraday_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
变化 1:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
`analytics_xxx.events_intraday_*`, `analytics_xxx.events_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
变体 1 错误:列名 event_params 在 [17:15] 处不明确
变化2:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
`analytics_356435236.events_intraday_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param,
`analytics_356435236.events_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
变体 2 错误:列名 event_params 在 [19:15] 处不明确
变化3:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
(`analytics_xxx.events_intraday_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param),
(`analytics_xxx.events_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param)
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
变体 3 错误:语法错误:预期关键字 JOIN 但在 [16:48] 得到“,” SQL
变化4:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
(`analytics_xxx.events_intraday_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param)
JOIN
(`analytics_xxx.events_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param)
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
变体 4 错误:语法错误:预期关键字 JOIN 但在 [16:48] 处得到“,”
...等等。我尝试了十几种其他变体,但没有任何效果。非常感谢您的帮助!
为了看看会发生什么,我询问了 ChatGPT,它给了我以下似乎运行完美的代码:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM (
SELECT * FROM `analytics_xxx.events_*`
WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
UNION ALL
SELECT * FROM `analytics_xxx.events_intraday_*`
WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
), UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param
WHERE
event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
我之前尝试过“UNION ALL”,但关键的变化似乎是将“WHERE”分成一个部分,用于按日期后缀选择表,以及一个用于选择记录参数的选项。
@Jaytiger
我正在写一篇文章,因为我没有足够的代表来发表评论。仅当当天结束时,日内交易才会“合并”到事件表中。
因此,如果你想要当前的实时数据,就必须使用日内数据。
@OP。这是总体思路。 您不需要按月和年过滤您的日内数据。之前的盘内日期将被自动删除。但为了安全起见,请使用最新的。
select *
from
(
(select *
from `analytics_fillyourid.events_*`
where _TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
and FORMAT_DATE("%Y%m%d", CURRENT_DATE()))
UNION ALL
(select *
from `analytics_fillyourid.events_intraday_*`
where _TABLE_SUFFIX = FORMAT_DATE("%Y%m%d", CURRENT_DATE()))
)