如何在不使用 unnest 的情况下在 BigQuery 中取消数组的嵌套?

问题描述 投票:0回答:1

我试图在查询中选择hits.page.searchKeyword,但出现经典的嵌套错误。除了解除嵌套之外,我还有其他选择吗?当我从点击(GA3 - UA)中取消嵌套数据时,所有其他选择都会重复它们的值,并且修复起来很混乱。我只需要将搜索词添加到我的查询中,我已在方案中将其标识为 hist.page.searchKeyword,然后就完成了。但我需要在没有取消嵌套函数的情况下取消嵌套命中数组。请有人帮忙。

SELECT 
  parse_date('%Y%m%d', date) AS date,
  channelGrouping,trafficSource.campaign, 
  hits.page.searchKeyword as searchterm,
  SUM(totals.visits) AS sessions, 
  count(distinct clientId),
  SUM(totals.transactions) AS transactions,
  SUM(totals.totalTransactionRevenue) AS revenue1mil
FROM `vdxl-prod-data-reporting-01.200759185.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20230101' AND '20231102'
GROUP BY 1, channelGrouping, trafficSource.campaign, hits.page.searchKeyword
ORDER BY 1 DESC
Cannot access field page on a value with type ARRAY<STRUCT<hitNumber INT64, time INT64, hour INT64, ...>> at [4:8]
google-cloud-platform google-bigquery google-analytics universal
1个回答
0
投票

UNNEST()
只是将数组转换为关系。它不会自动将该关系与主表连接起来。仅当您执行类似
FROM `vdxl-prod-data-reporting-01.200759185.ga_sessions_*` t cross join t.hits
之类的操作时才会发生这种情况(有时
cross join
会被其别名
,
替换)

hits.page.searchKeyword
是内部搜索,意味着每个会话可以有多个搜索 - 您希望如何将它们聚合到会话范围?

一种方法可能是获取按这些关键字排序的不同列表:

SELECT 
  parse_date('%Y%m%d', date) AS date,
  channelGrouping,
  trafficSource.campaign, 
  (select string_agg(distinct page.searchKeyword order by page.searchKeyword) from unnest(hits)) as searchterm,
  SUM(totals.visits) AS sessions, 
  count(distinct clientId),
  SUM(totals.transactions) AS transactions,
  SUM(totals.totalTransactionRevenue) AS revenue1mil
FROM `vdxl-prod-data-reporting-01.200759185.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20230101' AND '20231102'
GROUP BY 1, 2, 3, 4
ORDER BY 1 DESC

如果您想要每行一个关键字,则需要交叉联接 - 当然,它会为该行中的每个命中重复会话范围的行...这就是联接的作用。然后,您需要计算点击范围的指标,并且不能再使用会话聚合/总计:

SELECT 
  parse_date('%Y%m%d', date) AS date,
  channelGrouping,
  trafficSource.campaign, 
  h.page.searchKeyword,
  count(distinct fullvisitorid || visitstarttime) AS sessions, 
  count(distinct clientId) as clients,
  countif(h.ecommerceaction.action_type='6') AS transactions,
  SUM(h.transaction.transactionrevenue /1000000) AS revenue
FROM `vdxl-prod-data-reporting-01.200759185.ga_sessions_*` as t
  CROSS JOIN t.hits as h
WHERE 
  _TABLE_SUFFIX BETWEEN '20230101' AND '20231102'
GROUP BY 
  1, 2, 3, 4
ORDER BY 
  1 DESC, 2, 3, 4
© www.soinside.com 2019 - 2024. All rights reserved.