如何更好地在MySQL的数据透视表上使用逻辑门

问题描述 投票:0回答:1

目标是使用基于数据透视表的逻辑查询数据。我想支持6个逻辑门[AND, OR, XOR, NAND, NOR, XNOR],以便用户创建自定义过滤器时,他们可以提供逻辑门和要在逻辑中使用的tags.id值。

示例数据:

+----------+
| data     |
|----+-----+
| id | ... |
+----+-----+
| s  | ... |
| t  | ... |
| u  | ... |
| v  | ... |
| w  | ... |
| x  | ... |
| y  | ... |
| z  | ... |
+----+-----+

+---------+
| pivot   |
|----+----|
| c1 | c2 |
+----+----+
| t  | a  |
| t  | b  |
| t  | c  |
| u  | a  |
| u  | b  |
| v  | b  |
| v  | c  |
| w  | a  |
| w  | c  |
| x  | a  |
| y  | b  |
| z  | c  |
+----+----+

+----------+
| tags     |
+----+-----+
| id | ... |
+----+-----+
| a  | ... |
| b  | ... |
| c  | ... |
| d  | ... |
+----+-----+

预期的输出:

  • AND(a, b, c) = [t]
  • OR(a, b, c) = [t, u, v, w, x, y, z]
  • XOR(a, b, c) = [u, v, w, x, y, z]
  • NAND(a, b, c) = [s, u, v, w, x, y, z]
  • NOR(a, b, c) = [s]
  • XNOR(a, b, c) = [s, t]

查询表示形式:

  • AND(a, b, c) = SELECT * FROM data WHERE id IN (SELECT c1 FROM pivot WHERE c2='a') AND id IN (SELECT c1 FROM pivot WHERE c2='b') AND id IN (SELECT c1 FROM pivot WHERE c2='c')
  • OR(a, b, c) = SELECT * FROM data WHERE id IN (SELECT c1 FROM pivot WHERE c2 IN ('a', 'b', 'c'))
  • XOR(a, b, c) = SELECT * FROM data WHERE id IN (SELECT c1 FROM pivot WHERE c2 IN ('a', 'b', 'c')) AND !(id IN (SELECT c1 FROM pivot WHERE c2='a') AND id IN (SELECT c1 FROM pivot WHERE c2='b') AND id IN (SELECT c1 FROM pivot WHERE c2='c'))
  • NAND(a, b, c) = SELECT * FROM data WHERE !(id IN (SELECT c1 FROM pivot WHERE c2='a') AND id IN (SELECT c1 FROM pivot WHERE c2='b') AND id IN (SELECT c1 FROM pivot WHERE c2='c'))
  • NOR(a, b, c) = SELECT * FROM data WHERE id NOT IN (SELECT c1 FROM pivot WHERE c2 IN ('a', 'b', 'c'))
  • XNOR(a, b, c) = SELECT * FROM data WHERE id NOT IN (SELECT c1 FROM pivot WHERE c2 IN ('a', 'b', 'c')) OR (id IN (SELECT c1 FROM pivot WHERE c2='a') AND id IN (SELECT c1 FROM pivot WHERE c2='b') AND id IN (SELECT c1 FROM pivot WHERE c2='c'))

数据目前约3万行,而标签则约1.5万行。我目前正在用10万行填充此数据透视表,以对上述查询进行一些测试。我觉得多个IN()语句会使速度变慢。

这些查询是否可以通过视图,联接或其他MySQL操作进行微调?

此外,如果您有更好的结构建议,我也很高兴。我以前曾尝试在数据中使用JSON字段来避免使用数据透视表,但是事实证明这非常慢。

EDIT:尽管OR(...)NOR(...)查询的确运行良好(〜80ms),但AND(...)查询的执行效果却很差(〜1300ms)。查看EXPLAIN并尝试跟随MySQL subquery opimization suggestions以使用DISTINCT ... INNER JOIN生成更好的单个子查询实际上使情况变得更糟。

[通过测试,我偶然发现,用lodash的_.intersection(...)可以比用纯MySQL更快地完成Node中多个ID的交集。

因此,无需使用子查询来形成AND(...)逻辑,我可以从MySQL中提取单独的子查询,然后在API本身中,使用lodash进行交集,然后生成单个列表以用于最终的[ C0]语句进行最终过滤。

mysql logic query-performance
1个回答
0
投票

OR(a,b,c)= [t,u,v,w,x,y,z]

IN(...)

其余的比较混乱,所以让我问一下可以存在多少个不同的c1和c2。如果不超过64,则可以对SELECT GROUP_CONCAT(DISTINCT c1) FROM pivot WHERE p2 IN ('a', 'b', 'c'); 中的位执行布尔运算。

另请参见数据类型BIGINT UNSIGNEDENUM

如果您具有MySQL 8.0,则布尔运算符可在SET上使用,因此远远超过64。

[BLOB是语法糖,不是性能工具。

VIEWs通常效率很低;尝试通过使用IN(SELECT ...)EXISTS( SELECT 1 ... )(或JOIN)来避免它。

AND(a,b,c)= [t]

可以这样实现:

LEFT JOIN

注意:

SELECT GROUP_CONCAT(DISTINCT c1)
    FROM (
        SELECT c1
            FROM pivot
            WHERE c2 IN ('a', 'b', 'c')
            HAVING COUNT(*) = 3   -- the number of items in a,b,c
         ) AS x ;

可以这样写:

c2 IN ('a', 'b', 'c')

这可能更容易将您的运算符转换为存储例程。

© www.soinside.com 2019 - 2024. All rights reserved.