这是我在BigQuery中使用公共数据集的查询:
SELECT RANGE_BUCKET(reputation, [400000, 500000, 600000, 700000, 800000, 900000, 1000000, 1100000, 1200000]) AS reputation_group, COUNT(*) AS count
FROM `bigquery-public-data.stackoverflow.users`
Where reputation > 200000
GROUP BY 1
ORDER By 1
结果如下:
而不是将信誉组显示为整数,我如何显示存储桶的范围:
0: [0-400000]
1: [400001-500000]
2: [500001-600000]
....
非常感谢。
UPDATE:非常感谢米哈伊尔(Mikhail)的回答,并在下面做了一些小改动:
SELECT bucket,
FORMAT('%i - %i', IFNULL(ranges[SAFE_OFFSET(bucket - 1)] + 1, 0), ranges[SAFE_OFFSET(bucket)]) AS reputation_group,
COUNT(*) AS COUNT
FROM `bigquery-public-data.stackoverflow.users`,
UNNEST([STRUCT([200000, 400000, 500000, 600000, 700000, 800000, 900000, 1000000, 1100000, 1200000] AS ranges)]),
UNNEST([RANGE_BUCKET(reputation, ranges)]) bucket
WHERE reputation > 200000
GROUP BY 1, 2
ORDER BY bucket
请注意,将额外的200000项添加到STRUCT,结果显示为200001 - 400000
代替0 - 400000
下面是BigQuery标准SQL的内容>>
#standardSQL
SELECT bucket,
FORMAT('%i - %i', IFNULL(ranges[SAFE_OFFSET(bucket - 1)] + 1, 0), ranges[SAFE_OFFSET(bucket)]) AS reputation_group,
COUNT(*) AS COUNT
FROM `bigquery-public-data.stackoverflow.users`,
UNNEST([STRUCT([400000, 500000, 600000, 700000, 800000, 900000, 1000000, 1100000, 1200000] AS ranges)]),
UNNEST([RANGE_BUCKET(reputation, ranges)]) bucket
WHERE reputation > 200000
GROUP BY 1, 2
ORDER BY bucket
带有JOIN
和一些重构:
非常感谢@Mikhail Berlyant和@Felipe Hoffa,这是Mikhail脚本中的一个很小的补充: