我的 SQL 查询有问题。查询应该很简单,它的目的是获取某个表中的重复行并将它们插入到另一个表中,但它确实花费了太多时间并且实际上永远不会完成。 我的表中大约有 500 万行,但其中只有 10.000 行是重复的。我的 SQL 查询如下所示:
WITH duplicates AS (
SELECT title, price, product_type, category_id, COUNT(*)
FROM phones
GROUP BY title, price, product_type, category_id
HAVING COUNT(*) > 1),
ordered_duplicates AS
(SELECT p.*, ROW_NUMBER() OVER (PARTITION BY p.title, p.price, p.product_type, p.category_id ORDER BY p.id) as row_number
FROM products AS p
WHERE (p.title, p.price, p.product_type, p.category_id)
IN (SELECT title, price, product_type, category_id FROM duplicates))
INSERT INTO products_backup
SELECT d.id,
d.price,
d.quantity
FROM ordered_duplicates AS d
WHERE d.row_number > 1
ORDER BY d.id;
每次我运行这个查询都需要一个多小时才能执行,这太糟糕了:(
我尝试了多种变体,但我无法让它发挥作用。
而且我知道肯定有更好的写法。
这就是我来这里的原因。
干杯
为了优化这个,
对于第一部分,您可以如下重写您的查询:
WITH duplicates AS (
SELECT title, price, product_type, category_id, COUNT(*)
FROM phones
GROUP BY title, price, product_type, category_id
HAVING COUNT(*) > 1),
ordered_duplicates AS
(SELECT p.id, p.price. p.quantity,
MIN(p.id) OVER (PARTITION BY p.title, p.price, p.product_type, p.category_id) as min_id
FROM products AS p
WHERE (p.title, p.price, p.product_type, p.category_id)
IN (SELECT title, price, product_type, category_id FROM duplicates))
INSERT INTO products_backup
SELECT d.id,
d.price,
d.quantity
FROM ordered_duplicates AS d
WHERE id<> min_id