选择聚合 SQL 语句的子项的 SUM 和 COUNT

问题描述 投票:0回答:1

背景

我正在开发一个界面,显示与促销活动相关的用户行为的汇总统计数据。用于存储这些操作的数据库是ClickHouse(为了大规模聚合统计)。据我所知,查询语法可以非常接近地模拟常规 MySQL。

相关表格如下所示:

page_visit_actions

  • 促销_id
  • 动作类型
  • campaign_id
  • ...

购买

  • 促销_id
  • 价格
  • campaign_id
  • ...

任务

我需要显示由

page_visit_actions
表中的任意列聚合执行的某些操作的数量。例如,这可能是按促销或按日期汇总的产品份额数量。

最少的设置

CREATE TABLE page_visit_actions (
    promotion_id VARCHAR(100),
    action_type VARCHAR(100),
    campaign_id int UNSIGNED
);

CREATE TABLE purchases (
    promotion_id VARCHAR(100),
    price FLOAT(10, 2),
    campaign_id int UNSIGNED
);


INSERT INTO page_visit_actions (promotion_id, action_type, campaign_id) VALUES ('5ac4ea14-f157-45a0-a706-765f4c624920', 'share_product', 17);
INSERT INTO page_visit_actions (promotion_id, action_type, campaign_id) VALUES ('5ac4ea14-f157-45a0-a706-765f4c624920', 'add_product_to_favorites', 17);
INSERT INTO page_visit_actions (promotion_id, action_type, campaign_id) VALUES ('7140beb4-5fc9-46b1-9790-0b580b6c83f3', 'add_product_to_favorites', 17);
INSERT INTO page_visit_actions (promotion_id, action_type, campaign_id) VALUES ('97a207e6-c9f8-4edf-8f9f-397544c790b9', 'add_product_to_favorites', 17);

INSERT INTO purchases (promotion_id, price, campaign_id) VALUES ('5ac4ea14-f157-45a0-a706-765f4c624920', 15.0, 17);
INSERT INTO purchases (promotion_id, price, campaign_id) VALUES ('5ac4ea14-f157-45a0-a706-765f4c624920', 5.0, 17);
INSERT INTO purchases (promotion_id, price, campaign_id) VALUES ('97a207e6-c9f8-4edf-8f9f-397544c790b9', 5.0, 17);

查询

我从这个简单的聚合查询开始,以获取每个促销的操作:

select 
promotion_id, 
count(*) as total, 
sum(case when action_type = 'share_product' then 1 else 0 end) as share, 
sum(case when action_type = 'add_product_to_favorites' then 1 else 0 end) as favorite, 
sum(case when action_type = 'add_product_to_basket' then 1 else 0 end) as basket 
from page_visit_actions
group by promotion_id;

查询按预期工作:

+--------------------------------------+-------+-------+----------+--------+
| promotion_id                         | total | share | favorite | basket |
+--------------------------------------+-------+-------+----------+--------+
| 5ac4ea14-f157-45a0-a706-765f4c624920 |     2 |     1 |        1 |      0 |
| 7140beb4-5fc9-46b1-9790-0b580b6c83f3 |     1 |     0 |        1 |      0 |
| 97a207e6-c9f8-4edf-8f9f-397544c790b9 |     1 |     0 |        1 |      0 |
+--------------------------------------+-------+-------+----------+--------+

问题

但是,当我被要求添加购买总额和收入以进行汇总统计时,我遇到了问题,并尝试了以下查询:

select 
page_visit_actions.promotion_id, 
count(page_visit_actions.promotion_id) as total, 
sum(case when action_type = 'share_product' then 1 else 0 end) as share, 
sum(case when action_type = 'add_product_to_favorites' then 1 else 0 end) as favorite, 
sum(case when action_type = 'add_product_to_basket' then 1 else 0 end) as basket,
count(purchases.promotion_id) as purchases,
sum(purchases.price) as revenue
from page_visit_actions
left join purchases on page_visit_actions.promotion_id=purchases.promotion_id
group by page_visit_actions.promotion_id;

我可以使用

promotion_id
page_visit_actions
共有的列
purchases
来连接表,但这会产生搞乱计数统计的副作用,因为多个操作可以引用同一个促销(促销是按每个用户提供,但用户可以分享和收藏促销活动,从而导致 2 个操作引用相同的促销活动),因此同一购买最终会加入多个操作。

因此,我尝试查询结果为:

+--------------------------------------+-------+-------+----------+--------+-----------+---------+
| promotion_id                         | total | share | favorite | basket | purchases | revenue |
+--------------------------------------+-------+-------+----------+--------+-----------+---------+
| 5ac4ea14-f157-45a0-a706-765f4c624920 |     4 |     2 |        2 |      0 |         4 |   40.00 |
| 7140beb4-5fc9-46b1-9790-0b580b6c83f3 |     1 |     0 |        1 |      0 |         0 |    NULL |
| 97a207e6-c9f8-4edf-8f9f-397544c790b9 |     1 |     0 |        1 |      0 |         1 |    5.00 |
+--------------------------------------+-------+-------+----------+--------+-----------+---------+

但是,正确的统计数据应该是这样的:

+--------------------------------------+-------+-------+----------+--------+-----------+---------+
| promotion_id                         | total | share | favorite | basket | purchases | revenue |
+--------------------------------------+-------+-------+----------+--------+-----------+---------+
| 5ac4ea14-f157-45a0-a706-765f4c624920 |     2 |     1 |        1 |      0 |         2 |   20.00 |
| 7140beb4-5fc9-46b1-9790-0b580b6c83f3 |     1 |     0 |        1 |      0 |         0 |    NULL |
| 97a207e6-c9f8-4edf-8f9f-397544c790b9 |     1 |     0 |        1 |      0 |         1 |    5.00 |
+--------------------------------------+-------+-------+----------+--------+-----------+---------+

问题

如何查询聚合表(page_visit_actions)相关的子项(购买)的

count()
sum()
,而不弄乱原来的聚合组统计数据?

sql count sum aggregate clickhouse
1个回答
0
投票

您需要分别聚合两个表,因为它们每个都与

promotion
具有一对多关系(您尚未显示)。

将所有内容连接在一起然后再聚合是聚合查询中的一个常见错误。这会给你带来错误的结果,因为你最终会在所有子表之间得到笛卡尔连接。

select 
  pva.*,
  p.purchases,
  p.revenue
from (
    select
      pva.promotion_id,
      count(*) as count_actions,
      count(case when pva.action_type = 'share_product' then 1 end) as share,
      count(case when pva.action_type = 'add_product_to_favorites' then 1 end) as favorite,
      count(case when pva.action_type = 'add_product_to_basket' then 1 end) as basket
    from page_visit_actions pva
    group by
      pva.promotion_id
) pva
left join (
    select
      p.promotion_id,
      count(*) as purchases,
      sum(p.price) as revenue
    from purchases p
    group by
      p.promotion_id
) p on pva.promotion_id = p.promotion_id;

db<>小提琴

© www.soinside.com 2019 - 2024. All rights reserved.