这是我的表的一部分,我通过排序value1来显示它
uniquekey city test2 test3 value1
0 001 NYC 40.724159 -73.754968 32
1 002 NYC 40.753028 -73.921620 22
2 003 LAX 40.845642 -73.902110 20
3 003 LAX 40.845642 -73.902110 19
4 002 NYC 40.753028 -73.921620 18
5 004 LAX 40.870346 -73.904400 17
6 005 LAX 40.849560 -73.834010 17
7 006 LAX 40.851080 -73.848611 17
8 002 NYC 40.753028 -73.921620 16
9 007 NYC 40.762978 -73.831980 16
我希望纽约市的最高值为1,洛杉矶市的最高值为1。
这里棘手的问题是显示第0行和第2行并不是一个容易的问题,因为有几行具有相同的唯一键,即对于LAX第2行和第3行,NYC第1,4,8行。
预期的产出应该是
city test2 test3 max(value1)
0 NYC 40.724159 -73.754968 66 <----32+18+16
1 LAX 40.845642 -73.902110 39 <----20+19
这是我的代码
query = '''
select city, test2, test3, max(value1)
from nypd
where city IN ('NYC','LAX')
group by city
order by value1 DESC
'''
它只显示前2个:
city test2 test3 max(value1)
0 NYC 40.724159 -73.754968 32
1 LAX 40.845642 -73.902110 20
您首先需要聚合以获得每个uniquekey
,city
,test2
和test3
组合的总和。
然后,为了得到每个城市具有最高总和的那些,你可以过滤row_number()
窗口函数按城市划分并按下降的总和1
排序。
SELECT city,
test2,
test3,
value1
FROM (SELECT city,
test2,
test3,
sum(value1) value1,
row_number() OVER (PARTITION BY city
ORDER BY sum(value1) DESC) rn
FROM nypd
WHERE city IN ('NYC', 'LAX')
GROUP BY uniquekey,
city,
test2,
test3) x
WHERE rn = 1;
然而,SQLite之前版本3.25.0的旧版本不支持row_number()
。在这里,您可以使用EXISTS
和相关子查询来检查是否存在大于当前总和的和,或者如果是平局,则检查另一行的uniquekey
是否更大。聚合可以放在CTE中,因此不需要在子查询中重复。
WITH cte
AS
(
SELECT uniquekey,
city,
test2,
test3,
sum(value1) value1
FROM nypd
WHERE city IN ('NYC', 'LAX')
GROUP BY uniquekey,
city,
test2,
test3
)
SELECT c1.city,
c1.test2,
c1.test3,
c1.value1
FROM cte c1
WHERE NOT EXISTS (SELECT *
FROM cte c2
WHERE c2.city = c1.city
AND (c2.value1 > c1.value1
OR c2.value1 = c1.value1
AND c2.uniquekey > c1.uniquekey));
这个怎么样?
select n.city, n.lat, n.long, sum(n.value1)
from nypd n
where n1.uniquekey = (select max(n2.uniquekey)
from nypd n2
where n2.city = n.city
)
group by n.city, n.lat, n.long;