我正在寻找按ID1,ID2分组的查询,但只返回ID为ID2的唯一ID1的ID。
我有这样的数据:
+------+------+
| ID1 | ID2 |
+------+------+
|1 |A |
+------+------+
|1 |A |
+------+------+
|2 |A |
+------+------+
|3 |B |
+------+------+
|3 |B |
+------+------+
|4 |C |
+------+------+
|5 |C |
+------+------+
|6 |D |
+------+------+
|6 |D |
+------+------+
|7 |E |
+------+------+
理想情况下,我的输出将如下所示:
+------+
| ID2 |
+------+
|A |
+------+
|C |
+------+
注意ID2 ='B'或ID2 ='D'有> 1条记录,但它们具有相同的ID1。在示例A中,即使ID1有重复的“1”值,我仍然希望选择它,因为还有另一个唯一ID1 - “2”。
您可以在COUNT(DISTINCT ID1)
子句中使用having
,如下所示:
SELECT ID2
FROM tbl
GROUP BY sID2
HAVING COUNT(DISTINCT ID1) > 1
select ID2
from t
group by ID2
having count(distinct ID1) > 1
不确定Impala是否有count(distinct)
,但这是相当标准的,所以我将假设它。 having
子句在group by
之后应用,因此它只保留您要查找的数据。
我建议:
select ID2
from t
group by ID2
having min(ID1) <> max(ID1);
我认为min()
和max()
比count(distinct)
具有更好的性能特征。
事实上,我希望这比count(distinct)
更好用:
select id2
from (select distinct id1, id2
from t
) x
group by id2
having count(*) > 1;