我有这样的查询:
SELECT product_id,
site,
category_id,
session_time,
sum(cast(coalesce("#clicks",
0) AS bigint)) AS clicks
FROM df
WHERE site IN ('com', 'co')
AND session_time = DATE('2020-02-27')
GROUP BY product_id, site, session_time, category_id
ORDER BY clicks desc
LIMIT 10
但是现在,我想查看每个站点的前10个product_id和基于点击的category_id。当我编写LIMIT函数时,它仅显示前10个产品,但没有按category_id和shop_id对其进行分组。
我该怎么做?
RANK()
/ clicks
分区中将site
降序,然后在外部查询中进行过滤来category
记录:SELECT *
FROM (
SELECT
product_id,
site,
category_id,
session_time,
SUM("#clicks") clicks,
RANK() OVER(PARTITION BY site, category_id ORDER BY sum("#clicks") DESC) rn
FROM df
WHERE
site IN ('com', 'co')
AND session_time = DATE('2020-02-27')
GROUP BY product_id, site, session_time, category_id
) t
WHERE rn <= 10
ORDER BY site, category, clicks desc
我不清楚为什么在coalesce()
中需要cast()
/sum()
逻辑(就像其他聚合函数一样,sum()
忽略了null
值,并且似乎#clicks
已经是一个数字),因此我将其删除了-如果出于某些我无法想到的原因,可以将其添加回去。