我在BigQuery中使用分组汇总/时,计数不同+大小写有问题。结果与我预期的不同。这是代码:
with tb1 as
(
select 'DN' as geography
,'1012658993824' as SKU
,1 as pageview
union all
select 'KR' as geography
,'1012658993824' as SKU
,7 as pageview
)
select geography
,count(distinct(case when pageview between 1 and 5 then SKU end)) as PV_from_0_to_5
,count(distinct(case when pageview between 6 and 10 then SKU end)) as PV_from_6_to_10
from tb1
group by rollup (1)
**Output:**
geography/ PV_from_0_to_5/ PV_from_6_to_10
NULL 1 1
DN 1 0
KR 0 1
**Expected Output**
geography/ PV_from_0_to_5/ PV_from_6_to_10
NULL 0 1
DN 1 0
KR 0 1
解释:显然,我是在DN和KR两个位置计算单个SKU的综合浏览量。问题是,在使用分组汇总时,我希望在计算不同的SKU之前,将综合浏览量(1 + 7 = 8个浏览量)。我无法将SUM放在不同的计数内,因此我不知道还有什么可以做。
我认为此GROUP BY ROLLUP
没问题。
您可以使用ARRAY_AGG()
查看正在计算的内容:
with tb1 as
(
select 'DN' as geography
,'1012658993824' as SKU
,1 as pageview
union all
select 'KR' as geography
,'1012658993824' as SKU
,7 as pageview
)
select geography
,count(distinct(case when pageview between 1 and 5 then SKU end)) as PV_from_0_to_5
,count(distinct(case when pageview between 6 and 10 then SKU end)) as PV_from_6_to_10
, ARRAY_AGG(case when pageview between 1 and 5 then SKU end IGNORE NULLS) PV_from_0_to_5_agg
, ARRAY_AGG(case when pageview between 6 and 10 then SKU end IGNORE NULLS) PV_from_6_to_10_agg
from tb1
group by rollup (geography)
示例查询恰好对两个不同的行使用相同的ID。
我也没有看到Felipe所说的按汇总分组的问题。为了提供更多的清晰度,我已经完成了以下查询:
with tb1 as
(
select 'DN' as geography
,'1012658993824' as SKU
,1 as pageview
union all
select 'KR' as geography
,'1012658993825' as SKU
,7 as pageview
)
select geography
,sum(pageview) as addition
,count(pageview) as count
,count(distinct(case when pageview between 1 and 5 then SKU end)) as PV_from_0_to_5
,count(distinct(case when pageview between 6 and 10 then SKU end)) as PV_from_6_to_10
, ARRAY_AGG(case when pageview between 1 and 5 then SKU end IGNORE NULLS) PV_from_0_to_5_agg
, ARRAY_AGG(case when pageview between 6 and 10 then SKU end IGNORE NULLS) PV_from_6_to_10_agg
from tb1
group by rollup (geography)
您将看到您得到加法8,第一行计数2。这意味着实际上您同时拥有综合浏览量1和综合浏览量7。因此,您得到的结果是预期的。
如果要获得所需的结果,则需要显式使用综合浏览量。那是:
with tb1 as
(
select 'DN' as geography
,'1012658993824' as SKU
,1 as pageview
union all
select 'KR' as geography
,'1012658993825' as SKU
,7 as pageview
)
select geography,sum(pageview) as addition
,CAST(sum(pageview)<5 AND sum(pageview)>0 AS INT64) as PV_from_0_to_5
,CAST(sum(pageview)>5 AND sum(pageview)<10 AS INT64) as PV_from_6_to_10
from tb1
group by rollup (geography)