目前我有一个相当大的查询,由
count()
,将每日、每周、每月的计数汇总到中间表中。avg()
按事件分组选择每个中间表的平均计数,对结果进行联合,因为我想为每日、每周、每月设置一个单独的列,将填充值 0 放入空专栏。虽然查询量很大,但我觉得我在做很多重复性的工作。有什么办法可以更好地执行此查询或使其更小吗?我以前没有真正做过这样的查询,所以我不太确定。
WITH monthly_counts as (
SELECT
event,
count(*) as count
FROM tracking_stuff
WHERE
event = 'thing'
OR event = 'thing2'
OR event = 'thing3'
GROUP BY event, date_trunc('month', created_at)
),
weekly_counts as (
SELECT
event,
count(*) as count
FROM tracking_stuff
WHERE
event = 'thing'
OR event = 'thing2'
OR event = 'thing3'
GROUP BY event, date_trunc('week', created_at)
),
daily_counts as (
SELECT
event,
count(*) as count
FROM tracking_stuff
WHERE
event = 'thing'
OR event = 'thing2'
OR event = 'thing3'
GROUP BY event, date_trunc('day', created_at)
),
query as (
SELECT
event,
0 as daily_avg,
0 as weekly_avg,
avg(count) as monthly_avg
FROM monthly_counts
GROUP BY event
UNION
SELECT
event,
0 as daily_avg,
avg(count) as weekly_avg,
0 as monthly_avg
FROM weekly_counts
GROUP BY event
UNION
SELECT
event,
avg(count) as daily_avg,
0 as weekly_avg,
0 as monthly_avg
FROM daily_counts
GROUP BY event
)
SELECT
event,
sum(daily_avg) as daily_avg,
sum(weekly_avg) as weekly_avg,
sum(monthly_avg) as monthly_avg
FROM query
GROUP BY event;
我会这样写查询:
select event, daily_avg, weekly_avg, monthly_avg
from (
select event, avg(count) monthly_avg
from (
select event, count(*)
from tracking_stuff
where event in ('thing1', 'thing2', 'thing3')
group by event, date_trunc('month', created_at)
) s
group by 1
) monthly
join (
select event, avg(count) weekly_avg
from (
select event, count(*)
from tracking_stuff
where event in ('thing1', 'thing2', 'thing3')
group by event, date_trunc('week', created_at)
) s
group by 1
) weekly using(event)
join (
select event, avg(count) daily_avg
from (
select event, count(*)
from tracking_stuff
where event in ('thing1', 'thing2', 'thing3')
group by event, date_trunc('day', created_at)
) s
group by 1
) daily using(event)
order by 1;
如果
where
条件消除了很大一部分数据(比如超过一半),使用 cte
可以稍微加快查询执行速度:
with the_data as (
select event, created_at
from tracking_stuff
where event in ('thing1', 'thing2', 'thing3')
)
select event, daily_avg, weekly_avg, monthly_avg
from (
select event, avg(count) monthly_avg
from (
select event, count(*)
from the_data
group by event, date_trunc('month', created_at)
) s
group by 1
) monthly
-- etc ...
出于好奇,我对数据做了测试:
create table tracking_stuff (event text, created_at timestamp);
insert into tracking_stuff
select 'thing' || random_int(9), '2016-01-01'::date+ random_int(365)
from generate_series(1, 1000000);
在每个查询中,我用
thing
替换了 thing1
,因此查询消除了大约 2/3 的行。
10 次测试的平均执行时间:
Original query 1106 ms
My query without cte 1077 ms
My query with cte 902 ms
Clodoaldo's query 5187 ms
grouping sets
FROM和WHERE子句选择的数据分别按照每个指定的分组集进行分组,就像简单的GROUP BY子句一样为每个组计算聚合,然后返回结果
select event,
avg(total) filter (where day is not null) as avg_day,
avg(total) filter (where week is not null) as avg_week,
avg(total) filter (where month is not null) as avg_month
from (
select
event,
date_trunc('day', created_at) as day,
date_trunc('week', created_at) as week,
date_trunc('month', created_at) as month,
count(*) as total
from tracking_stuff
where event in ('thing','thing2','thing3')
group by grouping sets ((event, 2), (event, 3), (event, 4))
) s
group by event
要了解有关
grouping sets
的更多信息,请考虑以下教程:one、two