我的工作任务是总结多个阵列的值,我已经达到了我的知识差距。非常感谢这一群体的见解和帮助。
挑战:
我在单个列BigQuery表的每一行中都有一系列域TLD。我想按每个TLD进行分组,并将每个TLD的总计数作为新表返回。
["biz","us","international","eu","com","co","world","us","international","eu","co","biz"]
["com","co","world"]
响应
**TLD_Name**
biz 2
us 2
international 2
eu 2
com 2
co 3
world 1
在此先感谢您的帮助。
假设数组列名为tlds
,您可以运行以下标准SQL查询:
SELECT
tld AS TLD_Name,
COUNT(*) AS count
FROM YourTable
CROSS JOIN UNNEST(tlds) AS tld
GROUP BY tld;
这会产生“扁平化”阵列并获得与每个TLD相关的计数的效果。
如果每行中的tld值是高度可重复的并且您有非常多的行 - 下面可能通过首先组合/聚合每行内的tld计数然后总结整个表级(对于BigQuery Standard SQL)来提供一点优化
#standardSQL
WITH `yourproject.yourdataset.yourtable` AS (
SELECT ["biz","us","international","eu","com","co","world","us","international","eu","co","biz"] tlds UNION ALL
SELECT ["com","co","world","biz"]
)
SELECT
tld_count.tld AS tld,
SUM(tld_count.cnt) AS cnt
FROM `yourproject.yourdataset.yourtable`,
UNNEST(ARRAY(SELECT AS STRUCT tld, COUNT(*) AS cnt FROM UNNEST(tlds) AS tld GROUP BY tld)) AS tld_count
GROUP BY tld