我有以下表格来模拟图书数据库:
CREATE TABLE Country (
ISO_3166 CHAR(2) PRIMARY KEY,
CountryName VARCHAR(256),
CID varchar(16)
);
CREATE TABLE Users (
UID INT PRIMARY KEY,
Username VARCHAR(256),
DoB DATE,
Age INT,
ISO_3166 CHAR(2) REFERENCES Country (ISO_3166)
);
CREATE TABLE Book (
ISBN VARCHAR(17) PRIMARY KEY,
Title VARCHAR(256),
Published DATE,
Pages INT,
Language VARCHAR(256)
);
CREATE TABLE Rating (
UID INT REFERENCES Users (UID),
ISBN VARCHAR(17) REFERENCES Book (ISBN),
PRIMARY KEY (UID,ISBN),
Rating int
);
我现在想找到每个国家评分最高的用户。我可以使用这个查询:
SELECT Country.CountryName as CountryName, Users.Username as Username, COUNT(Rating.Rating) as NumRatings
FROM Country
JOIN Users ON Users.ISO_3166 = Country.ISO_3166
JOIN Rating ON Users.UID = Rating.UID
GROUP BY Country.CID, CountryName, Username
ORDER BY CountryName ASC
以以下格式返回每个用户的评分数:
Countryname | Username | Number of Ratings of this user
我还管理了以下查询,它为每个国家/地区提供一个用户,但它不是收视率最高的那个:
SELECT DISTINCT ON (CountryName)
CountryName, Username, MAX(NumRatings)
FROM (
SELECT Country.CountryName as CountryName, Users.Username as Username, COUNT(Rating.Rating) as NumRatings
FROM Country
JOIN Users ON Users.ISO_3166 = Country.ISO_3166
JOIN Rating ON Users.UID = Rating.UID
GROUP BY Country.CID, CountryName, Username
ORDER BY CountryName ASC) AS MyTable
GROUP BY CountryName, Username, NumRatings
ORDER BY CountryName ASC;
但是如何编写一个查询来选择每个国家/地区的最大用户?
你是如此接近:
SELECT DISTINCT ON (CountryName)
CountryName, Username, NumRatings
FROM(
SELECT Country.CountryName as CountryName, Users.Username as Username, COUNT(Rating.Rating) as NumRatings
FROM Country
JOIN Users ON Users.ISO_3166 = Country.ISO_3166
JOIN Rating ON Users.UID = Rating.UID
GROUP BY Country.CID, CountryName, Username
ORDER BY CountryName ASC) AS MyTable
WHERE TRUE --no filtering needed
ORDER BY CountryName ASC, NumRatings DESC
Postgres 允许您在要区分的列由多行表示时进行排序以确定包含哪条记录。在这种情况下,按 NumRatings 降序排序应该会为您提供每个国家/地区具有最高 NumRatings 值的行中的值。
DISTINCT ON
很好,很容易获得每个国家收视率最高的one(正如“不同”一词所暗示的)用户。参见:
但是你想...
找到每个国家评分最高的用户。
每个国家/地区的收视率最高。
我想先聚合评级,然后加入用户表 - 在 CTE 中。然后使用
LATERAL
:在
WITH TIES
子查询中为每个国家选择一个或多个获胜者
WITH agg AS (
SELECT u.iso_3166, u.uid, u.username, r.numratings
FROM (
SELECT uid, count(*) AS numratings
FROM rating r
GROUP BY 1
) r
JOIN users u USING (uid)
)
SELECT c.countryname, a.username, a.numratings
FROM country c
LEFT JOIN LATERAL (
SELECT *
FROM agg a
WHERE a.iso_3166 = c.iso_3166
ORDER BY a.numratings DESC
FETCH FIRST 1 ROWS WITH TIES -- !
) a ON true;
关于“先聚合,后加入”:
关于
WITH TIES
:
关于
LATERAL
:
值得注意的是,你确实 not 想要
GROUP BY Country.CID
。 country.ISO_3166
是PK,改用它。 (我优化了查询,所以我根本不需要GROUP BY
中的国家。)