目标是使用基于数据透视表的逻辑查询数据。我想支持6个逻辑门[AND, OR, XOR, NAND, NOR, XNOR]
,以便用户创建自定义过滤器时,他们可以提供逻辑门和要在逻辑中使用的tags.id
值。
示例数据:
+----------+
| data |
|----+-----+
| id | ... |
+----+-----+
| s | ... |
| t | ... |
| u | ... |
| v | ... |
| w | ... |
| x | ... |
| y | ... |
| z | ... |
+----+-----+
+---------+
| pivot |
|----+----|
| c1 | c2 |
+----+----+
| t | a |
| t | b |
| t | c |
| u | a |
| u | b |
| v | b |
| v | c |
| w | a |
| w | c |
| x | a |
| y | b |
| z | c |
+----+----+
+----------+
| tags |
+----+-----+
| id | ... |
+----+-----+
| a | ... |
| b | ... |
| c | ... |
| d | ... |
+----+-----+
预期的输出:
AND(a, b, c)
= [t]
OR(a, b, c)
= [t, u, v, w, x, y, z]
XOR(a, b, c)
= [u, v, w, x, y, z]
NAND(a, b, c)
= [s, u, v, w, x, y, z]
NOR(a, b, c)
= [s]
XNOR(a, b, c)
= [s, t]
查询表示形式:
AND(a, b, c)
= SELECT * FROM data WHERE id IN (SELECT c1 FROM pivot WHERE c2='a') AND id IN (SELECT c1 FROM pivot WHERE c2='b') AND id IN (SELECT c1 FROM pivot WHERE c2='c')
OR(a, b, c)
= SELECT * FROM data WHERE id IN (SELECT c1 FROM pivot WHERE c2 IN ('a', 'b', 'c'))
XOR(a, b, c)
= SELECT * FROM data WHERE id IN (SELECT c1 FROM pivot WHERE c2 IN ('a', 'b', 'c')) AND !(id IN (SELECT c1 FROM pivot WHERE c2='a') AND id IN (SELECT c1 FROM pivot WHERE c2='b') AND id IN (SELECT c1 FROM pivot WHERE c2='c'))
NAND(a, b, c)
= SELECT * FROM data WHERE !(id IN (SELECT c1 FROM pivot WHERE c2='a') AND id IN (SELECT c1 FROM pivot WHERE c2='b') AND id IN (SELECT c1 FROM pivot WHERE c2='c'))
NOR(a, b, c)
= SELECT * FROM data WHERE id NOT IN (SELECT c1 FROM pivot WHERE c2 IN ('a', 'b', 'c'))
XNOR(a, b, c)
= SELECT * FROM data WHERE id NOT IN (SELECT c1 FROM pivot WHERE c2 IN ('a', 'b', 'c')) OR (id IN (SELECT c1 FROM pivot WHERE c2='a') AND id IN (SELECT c1 FROM pivot WHERE c2='b') AND id IN (SELECT c1 FROM pivot WHERE c2='c'))
数据目前约3万行,而标签则约1.5万行。我目前正在用10万行填充此数据透视表,以对上述查询进行一些测试。我觉得多个IN()
语句会使速度变慢。
这些查询是否可以通过视图,联接或其他MySQL操作进行微调?
此外,如果您有更好的结构建议,我也很高兴。我以前曾尝试在数据中使用JSON字段来避免使用数据透视表,但是事实证明这非常慢。
EDIT:尽管OR(...)
和NOR(...)
查询的确运行良好(〜80ms),但AND(...)
查询的执行效果却很差(〜1300ms)。查看EXPLAIN
并尝试跟随MySQL subquery opimization suggestions以使用DISTINCT ... INNER JOIN
生成更好的单个子查询实际上使情况变得更糟。
[通过测试,我偶然发现,用lodash的_.intersection(...)
可以比用纯MySQL更快地完成Node中多个ID的交集。
因此,无需使用子查询来形成AND(...)
逻辑,我可以从MySQL中提取单独的子查询,然后在API本身中,使用lodash进行交集,然后生成单个列表以用于最终的[ C0]语句进行最终过滤。
OR(a,b,c)= [t,u,v,w,x,y,z]
IN(...)
其余的比较混乱,所以让我问一下可以存在多少个不同的c1和c2。如果不超过64,则可以对SELECT GROUP_CONCAT(DISTINCT c1)
FROM pivot
WHERE p2 IN ('a', 'b', 'c');
中的位执行布尔运算。
另请参见数据类型BIGINT UNSIGNED
和ENUM
。
如果您具有MySQL 8.0,则布尔运算符可在SET
上使用,因此远远超过64。
[BLOB
是语法糖,不是性能工具。
VIEWs
通常效率很低;尝试通过使用IN(SELECT ...)
或EXISTS( SELECT 1 ... )
(或JOIN
)来避免它。
AND(a,b,c)= [t]
可以这样实现:
LEFT JOIN
注意:
SELECT GROUP_CONCAT(DISTINCT c1) FROM ( SELECT c1 FROM pivot WHERE c2 IN ('a', 'b', 'c') HAVING COUNT(*) = 3 -- the number of items in a,b,c ) AS x ;
可以这样写:
c2 IN ('a', 'b', 'c')
这可能更容易将您的运算符转换为存储例程。