我正在尝试对具有1.25亿行的表运行查询。数据存储有日期,我正尝试一次每个月选择数据。我正在使用类似的查询:
select id from stats where page regexp '...' and timestamp between '2020-04-12' and '2020-05-12'
说明显示:
+----+-------------+--------------------+-------+---------------------------+-----------+---------+------+----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+-------+---------------------------+-----------+---------+------+----------+------------------------------------+
| 1 | SIMPLE | stats | range | timestamp_video,timestamp | timestamp | 4 | NULL | 10257708 | Using index condition; Using where |
+----+-------------+--------------------+-------+---------------------------+-----------+---------+------+----------+------------------------------------+
这里要检查的行对我来说似乎很高:
select count(*) from stats where timestamp between '2020-04-12' and '2020-05-12';
返回:
+----------+
| count(*) |
+----------+
| 4840392 |
+----------+
数据库结构:
`page` text COLLATE utf8_unicode_ci NOT NULL,
`timestamp` date DEFAULT NULL,
KEY `timestamp_video` (`timestamp`,`video`),
KEY `timestamp` (`timestamp`)
page
列包含带有+1000个字符的条目。不需要timestamp_video
索引,是否有办法告诉MySQL忽略该索引,而只使用单个timestamp
索引?
也许有一种方法可以使用子查询来重写,以便返回符合时间戳的行,然后返回与正则表达式匹配的行?
当前执行查询需要超过19516秒。试图将其降低到600以下。
更新
正则表达式示例,
它可以是+12000个字符长(问题字符串为12077
,看起来像:]
access=()
括号内有10个不同字符长的字母数字字符串,用|
分隔。
部分完整示例:
page regexp 'access=(3slaug6h82|5qew9gd4tn|o7vr3e9tix|5coakhoymq|1axg2vf8qt|7uh9ptld4v|vpgaix9wm8|0klcvjbrm8|x19ozupcre|fo2tjd7cxn)'
样本值page
可能包含:
www.example.com/page?param1=true&access=3slaug6h82¶m3=false&user=1234
使用FULLTEXT(page)
:
SELECT ft FROM `desc_bug`
WHERE MATCH(ft) AGAINST('5qew9gd4tn|3slaug6h82|asdfasdfd' IN BOOLEAN MODE)