我的目标是,仅在满足左连接标准的另一个表中选择最多5条记录的平均值。假设我们有表一(左)和记录:
RECNUM ID DATE JOB
1 | cat | 2019.01.01 | meow
2 | dog | 2019.01.01 | bark
现在我们有表二(右)带有记录:
RECNUM ID Action_ID DATE REWARD
1 | cat | 1 | 2019.01.02 | 20
2 | cat | 99 | 2018.12.30 | 1
3 | cat | 23 | 2019.12.28 | 20
4 | cat | 54 | 2018.01.01 | 20
5 | cat | 32 | 2018.01.02 | 20
6 | cat | 21 | 2018.01.03 | 20
7 | cat | 43 | 2018.12.28 | 1
8 | cat | 65 | 2018.12.29 | 1
9 | cat | 87 | 2018.09.12 | 1
10 | cat | 98 | 2018.10.11 | 1
11 | dog | 56 | 2018.09.01 | 99
12 | dog | 42 | 2019.09.02 | 99
结果应返回:
ID | AVG(Reward_from_latest_5_jobs)
cat | 1
现在满足的标准应该是:对于left table中的每个作业,请尝试为right table中的相同ID查找5个最新但较旧的唯一Action_ID,并计算它们的平均值。因此,换句话说,工作很艰难,我们不知道该给他什么报酬,我们试图计算他最近获得的五次报酬的平均值。如果发现少于5个,则不返回任何内容/输入null,如果更多,则丢弃最旧的。
我想要这样做的方式就像:
SELECT a."ID", COUNT(b."Action_ID"), AVG(b."REWARD")
FROM
(
SELECT "ID", "DATE"
FROM :left_table
) a
LEFT JOIN
(
SELECT "ID", "Action_ID", "DATE", "REWARD"
FROM :right_table
) b
ON(
a."ID" = b."ID"
)
WHERE a."DATE" > b."DATE"
GROUP BY a."ID"
HAVING COUNT(b."Action_ID") >= 5;
但是,它将为所有符合条件的Action_ID帽子计算,而不仅仅是五个最新的。你能告诉我如何达到预期的结果吗?我可以使用子表,而不必在一个SQL语句中完成。此用例不允许使用任何过程。任何输入,高度赞赏。
您可以使用窗口函数,然后进行聚合:
select
id,
avg(reward) avg_reward
from (
select
t1.id,
t2.reward,
count(*) over(partition by t1.id) cnt,
rank() over(partition by t1.id order by t2.date desc) rn
from leftable t1
inner join righttable t2 on t1.id = t2.id and t2.date >= t1.date
) t
where cnt >= 5 and rn <= 5
group by id
使用窗口函数获得前五名:
select id, avg(reward)
from (select r.*,
row_number() over (partition by l.id order by r.date desc) as seqnum
from table1 l join
table2 r
on l.id = r.id and l.date > r.date
) r
group by id
having count(*) >= 5;