我有多个重复的ID,需要减少到一个值。通常我会使用聚合方法来组合列值(作为总和,平均值等)。在这里,我感兴趣的是在所有列中保留具有最大数量的非空值的行:
鉴于此表:
id col1 col2 col3
1 a '' ''
1 a b ''
2 x y ''
1 a b c
2 s '' ''
我如何选择:
id col1 col2 col3
2 x y ''
1 a b c
有了这个查询:
select id,
max(
(case when col1 is not null then 1 else 0 end) +
(case when col2 is not null then 1 else 0 end) +
(case when col3 is not null then 1 else 0 end)
) maxnotnulls
from tablename
group by id
您可以为每个id获取每个id的最大非空列数。 所以您可以使用上面的查询加入表格,如下所示:
select t.* from tablename t
inner join (
select id,
max(
(case when col1 is not null then 1 else 0 end) +
(case when col2 is not null then 1 else 0 end) +
(case when col3 is not null then 1 else 0 end)
) maxnotnulls
from tablename
group by id
) g
on
g.id = t.id
and
(case when t.col1 is not null then 1 else 0 end) +
(case when t.col2 is not null then 1 else 0 end) +
(case when t.col3 is not null then 1 else 0 end) = g.maxnotnulls
假设空字符串实际上是NULL
,标准SQL中最简单的方法是:
select t.*
from (select t.*,
row_number() over (partition by id
order by ((case when col1 is not null then 1 else 0 end) +
(case when col2 is not null then 1 else 0 end) +
(case when col3 is not null then 1 else 0 end) desc
)
) as seqnum
from t
) t
where seqnum = 1;
当然,这很容易适应与空字符串的比较。
您可以使用suqery作为非空的最大总和并加入例如限制到前2
select m.* from
my_table m
INNER JOIN (
select id
, if (col1 is null, 0, 1) +
if (col2 is null, 0, 1) +
if (col3 is null, 0, 1) result
from my_table
order by result desc
limit 2
) t on t.id = m.id