需要合并为每个组的单个记录,并以这样的方式合并数据,即我们拥有最完整的属性集

问题描述 投票:1回答:3
SELECT a.*
FROM MRSVoid.dbo.Customer_Dataset$ a
CROSS JOIN
(SELECT 
[Customer_LastName]
,[Customer_FirstName]
,[Customer_AddressLine1]

,[Customer_HomePhone]
,[Customer_InternetEmail]
FROM MRSVoid.dbo.Customer_Dataset$ 
GROUP BY [Customer_LastName],
[Customer_FirstName],
[Customer_AddressLine1],
[Customer_InternetEmail],
[Customer_HomePhone]
HAVING count(*) > 1) b
where ((a.Customer_LastName = b.Customer_LastName) OR (a.Customer_LastName is NULL AND b.Customer_LastName is NULL))
AND ((a.Customer_FirstName = b.Customer_FirstName) OR (a.Customer_FirstName is NULL AND b.Customer_FirstName is NULL))
AND ((a.Customer_AddressLine1 = b.Customer_AddressLine1) OR (a.Customer_AddressLine1 is NULL AND b.Customer_AddressLine1 is NULL))
AND ((a.Customer_InternetEmail = b.Customer_InternetEmail) OR (a.Customer_InternetEmail is NULL AND b.Customer_InternetEmail is NULL))
AND ((a.Customer_HomePhone = b.Customer_HomePhone) OR (a.Customer_HomePhone is NULL AND b.Customer_HomePhone is NULL))
order by Customer_AddressLine1

此查询为我提供了来自数据集的重复行,现在我需要合并为每个组的单个记录,并且数据合并的方式使我们拥有尽可能完整的属性集。示例:a。如果两个重复记录共享一个电子邮件地址,但只有一个具有完整的邮寄地址,则生成的合并记录应包含电子邮件地址和邮寄地址。湾如果两个重复记录对于以下之一具有不同的值,则合并记录应使用由ModifiedOn和/或CreatedOn时间戳值标识的更新近的属性。

样本数据

ID  CreatedOn   ModifiedOn  Customer_LastName   Customer_FirstName  Customer_AddressLine1   Customer_City Customer_State    Customer_Zip    Customer_HomePhone  Customer_InternetEmail
27196   2012-11-14 18:51:07.000 2012-11-17 15:28:45.000 NULL    David   98 Pelmor Dr    Marmora OR  85044   NULL NULL
14983   2012-11-18 14:02:44.000 2012-11-18 14:02:44.000 NULL    David   98 Pelmor Dr    Marmora OR  85044   NULL NULL
sql asp.net sql-server tsql
3个回答
0
投票

您可以使用row_number()窗口功能

with cte as
(
SELECT a.*
FROM MRSVoid.dbo.Customer_Dataset$ a
CROSS JOIN
(SELECT 
[Customer_LastName]
,[Customer_FirstName]
,[Customer_AddressLine1]

,[Customer_HomePhone]
,[Customer_InternetEmail]
FROM MRSVoid.dbo.Customer_Dataset$ 
GROUP BY [Customer_LastName],
[Customer_FirstName],
[Customer_AddressLine1],
[Customer_InternetEmail],
[Customer_HomePhone]
HAVING count(*) > 1) b
where ((a.Customer_LastName = b.Customer_LastName) OR (a.Customer_LastName is NULL AND b.Customer_LastName is NULL))
AND ((a.Customer_FirstName = b.Customer_FirstName) OR (a.Customer_FirstName is NULL AND b.Customer_FirstName is NULL))
AND ((a.Customer_AddressLine1 = b.Customer_AddressLine1) OR (a.Customer_AddressLine1 is NULL AND b.Customer_AddressLine1 is NULL))
AND ((a.Customer_InternetEmail = b.Customer_InternetEmail) OR (a.Customer_InternetEmail is NULL AND b.Customer_InternetEmail is NULL))
AND ((a.Customer_HomePhone = b.Customer_HomePhone) OR (a.Customer_HomePhone is NULL AND b.Customer_HomePhone is NULL))
)

select * from 
(
select *, row_number() over(partition by Customer_LastName,Customer_FirstName,  Customer_AddressLine1 order by ModifiedOn desc) as rn from cte
)A where rn=1

0
投票

不是一个完整的解决方案,更像是一个想法:

SELECT t.CustomerName, q1.Email, q2.MailingAddress
FROM (
    SELECT CustomerName
    FROM Customers
    GROUP BY CustomerName
    HAVING COUNT(*)>1
) t
CROSS APPLY (
    SELECT TOP 1 c1.Email
    FROM Customers c1
    WHERE c1.CustomerName=t.CustomerName
    AND c1.Email IS NOT NULL
    ORDER BY ISNULL(ModifiedOn,CreatedOn) DESC
) q1
CROSS APPLY (
    SELECT TOP 1 c1.MailingAddress
    FROM Customers c1
    WHERE c1.CustomerName=t.CustomerName
    AND c1.MailingAddress IS NOT NULL
    ORDER BY ISNULL(ModifiedOn,CreatedOn) DESC
) q2

0
投票

要根据GROUP合并多行记录,您应该这样做。

SELECT   Max(id) as Id, 
         Max(createdon) as createdon, 
         Max(modifiedon) as  modifiedon
         --OTHER COLUMN USING MAX
FROM     ( 
                --YOUR CURRENT QUERY
                SELECT <YOUR SELECT HERE> 
                FROM  ....
    ) t 
GROUP BY <ColumnNameOnWhichYouWantToGroup>

上面的查询将使用GROUP BY将多行转换为一行。使用聚合函数MAX获取正确的值。

© www.soinside.com 2019 - 2024. All rights reserved.