以代表产品在MySQL v5.6.41分贝虚模式:
------------------------------------------------
| id | name | vendor_id | vendor_sku | upc | ean |
|----|------|-----------|------------|-----|-----|
| 1 | AAAA | 2 | 5678 | 456 | 111 | [1]
| 2 | aaaa | 2 | 7878 | 789 | 222 | [1]
| 3 | bbbb | 2 | 1234 | 111 | 333 | [2]
| 4 | cccc | 2 | 1234 | 222 | 444 | [2]
| 5 | dddd | 2 | 1111 | 123 | 555 | [3]
| 6 | eeee | 2 | 2222 | 123 | 666 | [3]
| 7 | ffff | 2 | 3333 | 333 | 777 | [4]
| 8 | gggg | 2 | 4444 | 444 | 777 | [4]
| 9 | hhhh | 2 | 5555 | 555 | 888 |
| 10 | iiii | 2 | 6666 | 666 | 999 |
| 11 | jjjj | 2 | 7777 | 777 | 000 |
| 12 | kkkk | 2 | 8888 | 888 | 001 |
| 13 | llll | 2 | 9999 | 999 | 002 |
| 14 | mmmm | 2 | 0000 | 000 | 003 |
------------------------------------------------
我试图找到匹配的下列条件之一是重复的行数:
vendor_id
和同vendor_sku
ORvendor_id
和同name
(不区分大小写)或vendor_id
和同upc
ORvendor_id
和同ean
(相邻的行中的符号[n]
将对应于这些行是哪个条件上一式两份)
我收集了此查询,到目前为止,但这只会匹配条件#1:
SELECT
count(*)
FROM
my_table
GROUP BY
vendor_id, vendor_sku
HAVING
COUNT(*) > 1
而我预期的结果将基于这个例子8
我想exists
可能的工作:
select count(*)
from my_table t
where exists (select 1
from my_table t2
where t2.vendor_id = t.vendor_id and
t2.id <> t.id and
(t2.vendor_sku = t.vendor_sku or
t2.name = t.name or
t2.upc = t.upc or
t2.ean = t.ean
)
);
需要注意的是区分大小写取决于你的排序规则。我还没有明确的处理增加了的情况下(我只想用lower()
),因为它是不明确的,这样的处理是必要的。
我仍然认为有可能的选择要做到这一点,而不使用相关子查询。当我能够摆脱掉相关子查询的执行计划通常变得更好。
所以:
SELECT
COUNT(DISTINCT t1.id)
FROM
my_table AS t1
INNER JOIN my_table AS t2 ON (
t1.vendor_id = t2.vendor_id
AND t1.id != t2.id
AND (
t1.vendor_sku = t2.vendor_sku
OR t1.name = t2.name
OR t1.upc = t2.upc
OR t1.ean = t2.ean
)
)
要么:
SELECT
COUNT(DISTINCT t1.id)
FROM
my_table AS t1
LEFT JOIN my_table AS t2 ON (
t1.vendor_id = t2.vendor_id
AND t1.id != t2.id
AND (
t1.vendor_sku = t2.vendor_sku
OR t1.name = t2.name
OR t1.upc = t2.upc
OR t1.ean = t2.ean
)
)
WHERE
t2.id IS NOT NULL
附:在错误指出当,所以我用德尔标签,而不是删除答案(对不起,那个)的我没有时间来解决我以前的答案。后来我想解决这个问题,但答案被删除,版主。