SQL Server 重复

问题描述 投票:0回答:1

我需要在名为

lead
的表中查找重复项,该表包含大约 100k 条记录。重复项在
company
列中具有相似的值,例如:

enter image description here

目标是仅保留最新的

leadid
(本例中为 95803)。但是,
leadid
95803 有一个问题,它在空格后有一些额外的字符。

我尝试使用以下脚本,但它没有提供所需的结果:

select t1.*
FROM [dbo].[LEAD] t1
LEFT JOIN (
    SELECT
        company,
        city,
        MAX(leadid) AS keep_leadid
    FROM [dbo].[LEAD]
    GROUP BY company, city
) t2 ON t1.company = t2.company AND t1.city = t2.city
WHERE t1.leadid <> t2.keep_leadid 
  AND t1.company LIKE '%Uvalde Country%'

如果您能在完善脚本以实现预期结果方面提供任何帮助,我们将不胜感激。

我想删除除此之外的所有内容:

enter image description here

有很多公司,有不同的字符串,我想为所有公司应用相同的脚本。

sql sql-server sql-server-2008 duplicates
1个回答
0
投票

这是一个尝试,但需要注意的是,您对最短的公司名称感兴趣,并将与以最短的公司名称开头的公司进行匹配,也不考虑城市:

declare @t table([sid] int not null identity(1,1), leadid int, company varchar(80));

insert into @t values(1,'company A');
insert into @t values(30,'company A INC');
insert into @t values(5,'company B');
insert into @t values(9,'company C');
insert into @t values(48,'company C INC');

--query to see join on companies that start with the same string
select *
from 
    @t a
    LEFT join @t b on a.company = left(b.company, len(a.company))

;WITH CTE AS
(
    --get max id per company
    select MAX(case when a.leadid > b.leadid then a.leadid else b.leadid end) max_id, 
    case when len(a.company) < len(b.company) then a.company else b.company end company,
    case when len(a.company) < len(b.company) then len(a.company) else len(b.company) end len_company
    from 
        @t a
        inner join @t b on a.company = left(b.company, len(a.company))
    group by --group by shortest company?
        case when len(a.company) < len(b.company) then a.company else b.company end,
        case when len(a.company) < len(b.company) then len(a.company) else len(b.company) end
), CTE2 AS
(
    --ROW NUMBER ON SHORTEST COMPANIES TO DISCARD ONES WITH MORE CHARACTERS AT THE END
    select max_id, MAX(company) company, ROW_NUMBER() OVER(PARTITION BY max_id ORDER BY MIN(len_company)) rn
    from CTE
    group by max_id, len_company
)
SELECT max_id, company
FROM 
    CTE2 
where
    rn = 1;
© www.soinside.com 2019 - 2024. All rights reserved.