我想获取每个分组的“顶行”数据,以及跨越整个分组的聚合指标。
下面是一个具体的例子,我使用连接来解决我的问题。
示例数据:
create or replace table TABLE_ID
(
fruit string,
store string,
state string,
cost numeric
);
insert into TABLE_ID
values
('apple', 'Whole Foods', 'CA', 28.3),
('apple', 'Walmart', 'UT', 3.2),
('apple', 'Whole Foods', 'AZ', 4.4),
('apple', 'Walmart', 'NY', 5.1),
('banana', 'Whole Foods', 'CO', 2.3),
('banana', 'Whole Foods', 'AZ', 28.8),
('banana', 'Walmart', 'NY', 93.3),
('banana', 'Whole Foods', 'NY', 20.1);
解决方案:
select b.*, a.total_cost
from (
select
fruit, sum(cost) as total_cost
from TABLE_ID
group by fruit
) a
left join
(
select fruit, store as top_purchase_store, state as top_purchase_state
from TABLE_ID
qualify row_number() over (partition by fruit order by cost desc) = 1
) b
on a.fruit = b.fruit
;
输出:
total_cost fruit top_purchase_store top_purchase_state
41 apple Whole Foods CA
144.5 banana Walmart NY
我觉得应该可以在不使用连接的情况下做到这一点。但是,我无法根据需要将 first_value 与 sum 聚合结合起来。
您有其他建议吗?
您也可以按 sum() 聚合进行分区。我确认这在 BigQuery 中有效。
select total_cost, fruit, top_purchase_store, top_purchase_state
from (
select fruit, store as top_purchase_store, state as top_purchase_state,
row_number() over (partition by fruit order by cost desc) as rn,
sum(cost) over (partition by fruit) as total_cost
from TABLE_ID
)z
where rn = 1;
总成本 | 水果 | top_purchase_store | top_purchase_state |
---|---|---|---|
41.0 | 苹果 | 全食 | CA |
144.5 | 香蕉 | 沃尔玛 | 纽约 |