在BigQuery中具有分组汇总的情况下,计数有不同+的问题

问题描述 投票:0回答:2

我在BigQuery中使用分组汇总/时,计数不同+大小写有问题。结果与我预期的不同。这是代码:

with tb1 as
(
select 'DN' as geography
       ,'1012658993824' as SKU
       ,1 as pageview

union all

select 'KR' as geography
       ,'1012658993824' as SKU
       ,7 as pageview

)
select geography
       ,count(distinct(case when  pageview between 1 and 5 then SKU end)) as PV_from_0_to_5
       ,count(distinct(case when  pageview between 6 and 10 then SKU end)) as PV_from_6_to_10
from tb1
group by rollup (1)

**Output:**
geography/ PV_from_0_to_5/ PV_from_6_to_10
NULL 1 1
DN 1 0
KR 0 1

**Expected Output**
geography/ PV_from_0_to_5/ PV_from_6_to_10
NULL 0 1
DN 1 0
KR 0 1

解释:显然,我是在DN和KR两个位置计算单个SKU的综合浏览量。问题是,在使用分组汇总时,我希望在计算不同的SKU之前,将综合浏览量(1 + 7 = 8个浏览量)。我无法将SUM放在不同的计数内,因此我不知道还有什么可以做。

google-bigquery
2个回答
0
投票

我认为此GROUP BY ROLLUP没问题。

您可以使用ARRAY_AGG()查看正在计算的内容:

with tb1 as
(
select 'DN' as geography
       ,'1012658993824' as SKU
       ,1 as pageview

union all

select 'KR' as geography
       ,'1012658993824' as SKU
       ,7 as pageview

)
select geography
       ,count(distinct(case when  pageview between 1 and 5 then SKU end)) as PV_from_0_to_5
       ,count(distinct(case when  pageview between 6 and 10 then SKU end)) as PV_from_6_to_10
       , ARRAY_AGG(case when  pageview between 1 and 5 then SKU end IGNORE NULLS) PV_from_0_to_5_agg
       , ARRAY_AGG(case when  pageview between 6 and 10 then SKU end IGNORE NULLS) PV_from_6_to_10_agg
from tb1
group by rollup (geography)

enter image description here

示例查询恰好对两个不同的行使用相同的ID。


0
投票

我也没有看到Felipe所说的按汇总分组的问题。为了提供更多的清晰度,我已经完成了以下查询:

with tb1 as
(
select 'DN' as geography
       ,'1012658993824' as SKU
       ,1 as pageview

union all

select 'KR' as geography
       ,'1012658993825' as SKU
       ,7 as pageview

)
select geography
       ,sum(pageview) as addition
       ,count(pageview) as count
       ,count(distinct(case when  pageview between 1 and 5 then SKU end)) as PV_from_0_to_5
       ,count(distinct(case when  pageview between 6 and 10 then SKU end)) as PV_from_6_to_10
       , ARRAY_AGG(case when  pageview between 1 and 5 then SKU end IGNORE NULLS) PV_from_0_to_5_agg
       , ARRAY_AGG(case when  pageview between 6 and 10 then SKU end IGNORE NULLS) PV_from_6_to_10_agg

from tb1
group by rollup (geography)

您将看到您得到加法8,第一行计数2。这意味着实际上您同时拥有综合浏览量1和综合浏览量7。因此,您得到的结果是预期的。

如果要获得所需的结果,则需要显式使用综合浏览量。那是:

with tb1 as
(
select 'DN' as geography
       ,'1012658993824' as SKU
       ,1 as pageview

union all

select 'KR' as geography
       ,'1012658993825' as SKU
       ,7 as pageview

)
select geography,sum(pageview) as addition
       ,CAST(sum(pageview)<5 AND sum(pageview)>0 AS INT64) as PV_from_0_to_5
       ,CAST(sum(pageview)>5 AND sum(pageview)<10 AS INT64) as PV_from_6_to_10

from tb1
group by rollup (geography)

© www.soinside.com 2019 - 2024. All rights reserved.