MySQL:基于联接的表重复项合并数据透视表中的项

问题描述 投票:0回答:1

我有2张桌子,参与者之一:

+----+------------+-----------+
| id | First Name | Last Name |
+----+------------+-----------+
|  0 | John       | Snow      |
|  1 | John       | Snow      |
|  2 | Michael    | Jackson   |
+----+------------+-----------+

还有一个将参与者与事件联系起来的数据透视表:

+----+----------------+----------+
| id | participant_id | event_id |
+----+----------------+----------+
|  0 |              0 |       12 |
|  1 |              1 |       35 |
|  2 |              2 |       35 |
+----+----------------+----------+

错误,参与者表中有重复的条目。

如何删除参与者表中的重复条目并相应地更新数据透视表?因此,预期结果将是:

参与者:

+----+------------+-----------+
| id | First Name | Last Name |
+----+------------+-----------+
|  0 | John       | Snow      |
|    |            |           | //deleted
|  2 | Michael    | Jackson   |
+----+------------+-----------+

数据透视表:

+----+----------------+----------+
| id | participant_id | event_id |
+----+----------------+----------+
|  0 |              0 |       12 |
|  1 |              0 |       35 | //participant_id changed from 1 to 0
|  2 |              2 |       35 |
+----+----------------+----------+
mysql
1个回答
0
投票

这将是一个多步骤过程:

  • 第一步是更新映射表pivot。以下查询将为您提供所有重复的名称,以及它们的第一个id
SELECT first_name, last_name, MIN(id) AS first_id 
FROM participants 
GROUP BY first_name, last_name 
HAVING COUNT(*) > 1 -- more than one rows means duplicates exist

您可以使用以上查询作为子查询来使用一系列联接来更新pivot表:

UPDATE pivot AS m 
JOIN participants AS p1 
  ON p1.id = m.participant_id 
JOIN (
       SELECT first_name, last_name, MIN(id) AS first_id 
       FROM participants 
       GROUP BY first_name, last_name 
       HAVING COUNT(*) > 1
     ) AS p2 ON p2.first_name = p1.first_name 
                AND p2.last_name = p1.last_name 
                AND p2.first_id <> p1.id  -- avoid the original row
SET m.participant_id = p2.first_id  -- update the duplicate row's id to first id
  • 现在,您可以使用相同的子查询来DELETE重复的行(以查找重复项):
DELETE p1 FROM participants AS p1 
JOIN (
       SELECT first_name, last_name, MIN(id) AS first_id 
       FROM participants 
       GROUP BY first_name, last_name 
       HAVING COUNT(*) > 1
     ) AS p2 ON p2.first_name = p1.first_name 
                AND p2.last_name = p1.last_name 
                AND p2.first_id <> p1.id  -- avoid the original row
  • 最后,通过在UNIQUE上定义(first_name, last_name)约束,在数据定义级别解决此问题,以避免再次发生此问题>
ALTER TABLE participants ADD CONSTRAINT unq_idx_name UNIQUE(first_name, last_name);
© www.soinside.com 2019 - 2024. All rights reserved.