了解使用CTE时的解释-尝试获取要计算的查询

问题描述 投票:0回答:1

我一直在努力查询,并尝试了各种变体以获得所需的结果。但是我失败了。我希望,如果我将与try语句输出一起尝试的变体共享,那么任何人都可能会有一个指针。

Postgres 11.6。

对于下面的代码块,Dimension1是我引用的所有表上都存在的字段。日期仅显示在会话表中,因此,为了提取特定日期的数据,我创建了一个cte filter_sessions以仅获取出现在给定日期的Dimension1,然后加入其他表。这使我的查询可以选择特定日期的数据,在这种情况下为2月6日。

这是我最初的尝试。它使用CTE,出于可读性考虑,我更喜欢该CTE,并且如果它可以运行,我可以省去编写更少代码的麻烦,但是它不会:

with 

filter_sessions as (
select 
    dimension1,
    dimension2,
    date,
    channel_grouping,
    device_category,
    user_type
from ga_flagship_ecom.sessions
where date >= '2020-02-06'
and date <= '2020-02-06'
),

ee as (
select 
    e.dimension1,
    e.dimension3,
    case when sum(case when e.metric1 = 0 then 1 else 0 end) > 0 then 1 else 0 end as zero_val_product, -- roll up to event level

    -- approximation for inferring if the product i a download and hence sees all the checkout steps
    case when sum(case when lower(product_name) ~ 'digital|download|file' then 1 else 0 end) > 0 then 1 else 0 end as download
from ga_flagship_ecom.ecom e
join filter_sessions f on f.dimension1 = e.dimension1
group by 1,2
),

ecom_events as (
select 
    ev.dimension1,
    ev.dimension3,
    ev.event_action,
    ev.event_label,
    ee.zero_val_product,
    ee.download
from ga_flagship_ecom.events ev 
join ee on ee.dimension1 = ev.dimension1 and ee.dimension3 = ev.dimension3
where ev.event_category = 'ecom'
)

select 
    s.date,
    lower(s.channel_grouping) as channel_grouping,
    lower(s.device_category) as device_category,
    lower(s.user_type) as user_type,
    lower(ev.event_action) as event_action,
    lower(coalesce(ev.event_label, 'na')) as event_label,
    ev.zero_val_product,
    ev.download,
    count(distinct s.dimension1) as sessions,
    count(distinct s.dimension2) as daily_users
from filter_sessions s
join ecom_events ev on ev.dimension1 = s.dimension1
group by 1,2,3,4,5,6,7,8;

这是该查询的解释输出如下:

GroupAggregate  (cost=222818.83..222818.88 rows=1 width=188)
  Group Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
  CTE filter_sessions
    ->  Index Scan using sessions_date_idx on sessions  (cost=0.56..2.78 rows=1 width=76)
          Index Cond: ((date >= '2020-02-06'::date) AND (date <= '2020-02-06'::date))
  CTE ee
    ->  GroupAggregate  (cost=47604.61..47606.29 rows=48 width=38)
          Group Key: e.dimension1, e.dimension3
          ->  Sort  (cost=47604.61..47604.73 rows=48 width=51)
                Sort Key: e.dimension1, e.dimension3
                ->  Nested Loop  (cost=0.56..47603.27 rows=48 width=51)
                      ->  CTE Scan on filter_sessions f  (cost=0.00..0.02 rows=1 width=32)
                      ->  Index Scan using ecom_dimension1_idx on ecom e  (cost=0.56..47602.77 rows=48 width=51)
                            Index Cond: ((dimension1)::text = (f.dimension1)::text)
  CTE ecom_events
    ->  Hash Join  (cost=1.68..175209.67 rows=1 width=60)
          Hash Cond: (((ev_1.dimension1)::text = (ee.dimension1)::text) AND (ev_1.dimension3 = ee.dimension3))
          ->  Seq Scan on events ev_1  (cost=0.00..150210.69 rows=3332973 width=52)
                Filter: ((event_category)::text = 'ecom'::text)
          ->  Hash  (cost=0.96..0.96 rows=48 width=48)
                ->  CTE Scan on ee  (cost=0.00..0.96 rows=48 width=48)
  ->  Sort  (cost=0.08..0.08 rows=1 width=236)
        Sort Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
        ->  Nested Loop  (cost=0.00..0.07 rows=1 width=236)
              Join Filter: ((s.dimension1)::text = (ev.dimension1)::text)
              ->  CTE Scan on filter_sessions s  (cost=0.00..0.02 rows=1 width=164)
              ->  CTE Scan on ecom_events ev  (cost=0.00..0.02 rows=1 width=104)

有人建议,cetee是我的瓶颈,我应该集中精力。我在cte ee上尝试了子查询,而不是引用cte filter_sessions。因此更改:

ee as (
select 
    e.dimension1,
    e.dimension3,
    case when sum(case when e.metric1 = 0 then 1 else 0 end) > 0 then 1 else 0 end as zero_val_product, -- roll up to event level

    -- approximation for inferring if the product i a download and hence sees all the checkout steps
    case when sum(case when lower(product_name) ~ 'digital|download|file' then 1 else 0 end) > 0 then 1 else 0 end as download
from ga_flagship_ecom.ecom e
--join filter_sessions f on f.dimension1 = e.dimension1
join (select dimension1 from ga_flagship_ecom.sessions where date >= '2020-02-06' and date <= '2020-02-06') f
    on f.dimension1 = e.dimension1
group by 1,2
),

这里有个小小的改动:

GroupAggregate  (cost=107619.19..107619.24 rows=1 width=188)
  Group Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
  CTE filter_sessions
    ->  Index Scan using sessions_date_idx on sessions  (cost=0.56..2.78 rows=1 width=76)
          Index Cond: ((date >= '2020-02-06'::date) AND (date <= '2020-02-06'::date))
  CTE ee
    ->  GroupAggregate  (cost=47606.05..47606.08 rows=1 width=38)
          Group Key: e.dimension1, e.dimension3
          ->  Sort  (cost=47606.05..47606.05 rows=1 width=51)
                Sort Key: e.dimension1, e.dimension3
                ->  Nested Loop  (cost=1.12..47606.04 rows=1 width=51)
                      ->  Index Only Scan using sessions_date_idx on sessions sessions_1  (cost=0.56..2.78 rows=1 width=22)
                            Index Cond: ((date >= '2020-02-06'::date) AND (date <= '2020-02-06'::date))
                      ->  Index Scan using ecom_dimension1_idx on ecom e  (cost=0.56..47602.77 rows=48 width=51)
                            Index Cond: ((dimension1)::text = (sessions_1.dimension1)::text)
  CTE ecom_events
    ->  Nested Loop  (cost=0.56..60010.25 rows=1 width=60)
          ->  CTE Scan on ee  (cost=0.00..0.02 rows=1 width=48)
          ->  Index Scan using events_pk on events ev_1  (cost=0.56..60010.22 rows=1 width=52)
                Index Cond: (((dimension1)::text = (ee.dimension1)::text) AND (dimension3 = ee.dimension3))
                Filter: ((event_category)::text = 'ecom'::text)
  ->  Sort  (cost=0.08..0.08 rows=1 width=236)
        Sort Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
        ->  Nested Loop  (cost=0.00..0.07 rows=1 width=236)
              Join Filter: ((s.dimension1)::text = (ev.dimension1)::text)
              ->  CTE Scan on filter_sessions s  (cost=0.00..0.02 rows=1 width=164)
              ->  CTE Scan on ecom_events ev  (cost=0.00..0.02 rows=1 width=104)

我不确定如何在解释输出中解释这些数字,但是对于cte ee,这些数字实际上是相同的,因此我认为更改没有太大的不同? CTE ee-> GroupAggregate (cost=47606.05..47606.08 rows=1 width=38)

不管哪种方式,查询仍然不会完成。我尝试过的其他操作(所有操作均失败,查询将无限期运行):

而不是内部联接,而是像这样的where过滤器:

ee as (
select 
    e.dimension1,
    e.dimension3,
    case when sum(case when e.metric1 = 0 then 1 else 0 end) > 0 then 1 else 0 end as zero_val_product, -- roll up to event level

    -- approximation for inferring if the product i a download and hence sees all the checkout steps
    case when sum(case when lower(product_name) ~ 'digital|download|file' then 1 else 0 end) > 0 then 1 else 0 end as download
from ga_flagship_ecom.ecom e
--join filter_sessions f on f.dimension1 = e.dimension1
where e.dimension1 in (select dimension1 from filter_sessions)
group by 1,2
),

这是基于使用where过滤器而不是内部联接的说明输出:

GroupAggregate  (cost=222818.84..222818.89 rows=1 width=188)
  Group Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
  CTE filter_sessions
    ->  Index Scan using sessions_date_idx on sessions  (cost=0.56..2.78 rows=1 width=76)
          Index Cond: ((date >= '2020-02-06'::date) AND (date <= '2020-02-06'::date))
  CTE ee
    ->  GroupAggregate  (cost=47604.63..47606.31 rows=48 width=38)
          Group Key: e.dimension1, e.dimension3
          ->  Sort  (cost=47604.63..47604.75 rows=48 width=51)
                Sort Key: e.dimension1, e.dimension3
                ->  Nested Loop  (cost=0.58..47603.29 rows=48 width=51)
                      ->  HashAggregate  (cost=0.02..0.03 rows=1 width=32)
                            Group Key: (filter_sessions.dimension1)::text
                            ->  CTE Scan on filter_sessions  (cost=0.00..0.02 rows=1 width=32)
                      ->  Index Scan using ecom_dimension1_idx on ecom e  (cost=0.56..47602.77 rows=48 width=51)
                            Index Cond: ((dimension1)::text = (filter_sessions.dimension1)::text)
  CTE ecom_events
    ->  Hash Join  (cost=1.68..175209.67 rows=1 width=60)
          Hash Cond: (((ev_1.dimension1)::text = (ee.dimension1)::text) AND (ev_1.dimension3 = ee.dimension3))
          ->  Seq Scan on events ev_1  (cost=0.00..150210.69 rows=3332973 width=52)
                Filter: ((event_category)::text = 'ecom'::text)
          ->  Hash  (cost=0.96..0.96 rows=48 width=48)
                ->  CTE Scan on ee  (cost=0.00..0.96 rows=48 width=48)
  ->  Sort  (cost=0.08..0.08 rows=1 width=236)
        Sort Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
        ->  Nested Loop  (cost=0.00..0.07 rows=1 width=236)
              Join Filter: ((s.dimension1)::text = (ev.dimension1)::text)
              ->  CTE Scan on filter_sessions s  (cost=0.00..0.02 rows=1 width=164)
              ->  CTE Scan on ecom_events ev  (cost=0.00..0.02 rows=1 width=104)

然后我尝试将cte ee分成两部分,如下所示:

ee_base as (
select 
    e.dimension1,
    e.dimension3,
    e.metric1,
    lower(product_name) as product_name
from ga_flagship_ecom.ecom e
join filter_sessions f on f.dimension1 = e.dimension1
),


ee as (
select 
    dimension1,
    dimension3,
    case when sum(case when metric1 = 0 then 1 else 0 end) > 0 then 1 else 0 end as zero_val_product, -- roll up to event level

    -- approximation for inferring if the product i a download and hence sees all the checkout steps
    case when sum(case when product_name ~ 'digital|download|file' then 1 else 0 end) > 0 then 1 else 0 end as download
from ee_base
group by 1,2
),

这也失败了(我真的很乐观这将会起作用)。这是此尝试的解释输出:

GroupAggregate  (cost=222818.33..222818.38 rows=1 width=188)
  Group Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
  CTE filter_sessions
    ->  Index Scan using sessions_date_idx on sessions  (cost=0.56..2.78 rows=1 width=76)
          Index Cond: ((date >= '2020-02-06'::date) AND (date <= '2020-02-06'::date))
  CTE ee_base
    ->  Nested Loop  (cost=0.56..47603.39 rows=48 width=66)
          ->  CTE Scan on filter_sessions f  (cost=0.00..0.02 rows=1 width=32)
          ->  Index Scan using ecom_dimension1_idx on ecom e  (cost=0.56..47602.77 rows=48 width=51)
                Index Cond: ((dimension1)::text = (f.dimension1)::text)
  CTE ee
    ->  HashAggregate  (cost=1.68..2.40 rows=48 width=48)
          Group Key: ee_base.dimension1, ee_base.dimension3
          ->  CTE Scan on ee_base  (cost=0.00..0.96 rows=48 width=76)
  CTE ecom_events
    ->  Hash Join  (cost=1.68..175209.67 rows=1 width=60)
          Hash Cond: (((ev_1.dimension1)::text = (ee.dimension1)::text) AND (ev_1.dimension3 = ee.dimension3))
          ->  Seq Scan on events ev_1  (cost=0.00..150210.69 rows=3332973 width=52)
                Filter: ((event_category)::text = 'ecom'::text)
          ->  Hash  (cost=0.96..0.96 rows=48 width=48)
                ->  CTE Scan on ee  (cost=0.00..0.96 rows=48 width=48)
  ->  Sort  (cost=0.08..0.08 rows=1 width=236)
        Sort Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
        ->  Nested Loop  (cost=0.00..0.07 rows=1 width=236)
              Join Filter: ((s.dimension1)::text = (ev.dimension1)::text)
              ->  CTE Scan on filter_sessions s  (cost=0.00..0.02 rows=1 width=164)
              ->  CTE Scan on ecom_events ev  (cost=0.00..0.02 rows=1 width=104)

有效的方法是创建临时表。但是我真的很想找到一种解决方法,并按照偏好的顺序解决:

  1. 仅使用CTE
  2. 使用CTE和子查询的组合
  3. 最后一次,备份选项,仅将临时表用于filter_sessions

我还能在这里做其他事情吗?

sql postgresql
1个回答
0
投票

您可以简单地将CTE重写为临时视图,这些视图包含在主查询计划中。


CREATE TEMP VIEW filter_sessions as
select
    dimension1,
    dimension2,
    zdate,
    channel_grouping,
    device_category,
    user_type
from ga_flagship_ecom.sessions
where zdate >= '2020-02-06'
and zdate <= '2020-02-06'
        ;

CREATE TEMP VIEW ee as
select
    e.dimension1,
    e.dimension3,
    case when sum(case when e.metric1 = 0 then 1 else 0 end) > 0 then 1 else 0 end as zero_val_product, -- roll up to event level

    -- approximation for inferring if the product i a download and hence sees all the checkout steps
    case when sum(case when lower(product_name) ~ 'digital|download|file' then 1 else 0 end) > 0 then 1 else 0 end as download
from ga_flagship_ecom.ecom e
join filter_sessions f on f.dimension1 = e.dimension1
group by 1,2
        ;

CREATE TEMP VIEW ecom_events as
select
    ev.dimension1,
    ev.dimension3,
    ev.event_action,
    ev.event_label,
    ee.zero_val_product,
    ee.download
from ga_flagship_ecom.events ev
join ee on ee.dimension1 = ev.dimension1 and ee.dimension3 = ev.dimension3
where ev.event_category = 'ecom'
        ;
select
    s.zdate,
    lower(s.channel_grouping) as channel_grouping,
    lower(s.device_category) as device_category,
    lower(s.user_type) as user_type,
    lower(ev.event_action) as event_action,
    lower(coalesce(ev.event_label, 'na')) as event_label,
    ev.zero_val_product,
    ev.download,
    count(distinct s.dimension1) as sessions,
    count(distinct s.dimension2) as daily_users
from filter_sessions s
join ecom_events ev on ev.dimension1 = s.dimension1
group by 1,2,3,4,5,6,7,8;
© www.soinside.com 2019 - 2024. All rights reserved.