我有一个项目来分析事件日志。目标基于父表(表1),以30秒为间隔比较时间戳。
表1
id command datetime
--------------------- ------------------- ----------------------
1 cat 2018-11-03 23:29:31
2 nmap 2018-11-03 23:22:32
3 ssh 2018-11-03 23:22:40
事件日志表
id raw datetime
--------------- ------------ --------------------
1 text 2018-11-03 23:23:10
2 text 2018-11-03 23:23:20
因此,基于table1 datetime,我想输出在时间iterval(例如30秒)中触发的所有事件日志
现在我使用此左连接语句,它对于小型表(小于1 MB)很好用:
SELECT table1.command as Command,table2.raw as Nginx,Table3.raw as Apache
FROM Table1
left join Table2
on Table1.datetime::timestamp>= Table2.datetime::timestamp - interval '30 seconds'
and Table1.datetime::timestamp<= Table2.datetime::timestamp + interval '30 seconds'
left join
Table3 on table1.datetime::timestamp>= Table3.datetime::timestamp - interval '1 seconds'
and Table1.datetime::timestamp<= Table3.datetime::timestamp + interval '30 seconds'
它工作正常,并提供了我想要的输出,问题是我的表具有200K +行,执行查询需要花费很多时间,这对于真正快速地运行不是至关重要的,但是例如,如果我联接3个表(示例中的Table1),而其他2个表包含200k +行,则查询时间超过5小时。
Bellow是一个解释性陈述,可帮助您理解:
Nested Loop Left Join (cost=0.00..930881273799.88 rows=1202913100267 width=1819)
Join Filter: ((b1.datetime >= (s1.datetime - '00:00:30'::interval)) AND (b1.datetime <= (s1.datetime + '00:00:30'::interval)))
-> Nested Loop Left Join (cost=0.00..60290628.13 rows=36384533 width=1343)
Join Filter: ((b1.datetime <= s2.datetime) AND (b1.datetime >= (s2.datetime - '00:00:30'::interval)))
-> Seq Scan on bash b1 (cost=0.00..75.13 rows=4013 width=34)
-> Materialize (cost=0.00..28885.00 rows=81600 width=1317)
-> Seq Scan on suricata__alert s2 (cost=0.00..15089.00 rows=81600 width=1317)
-> Materialize (cost=0.00..43131.25 rows=297550 width=492)
-> Seq Scan on suricata__http s1 (cost=0.00..22755.50 rows=297550 width=492)
我可以优化Join语句吗?我是否应该采用其他解决方法(使用Views,Indexes?)
仅在WHERE
条件为以下形式时才可以使用索引:>
<indexed expression> <operator> <constant>
其中<operator>
必须在定义索引的运算符类中,并且<constant>
并非必须是常数,而是在索引扫描期间具有固定值。
因此您应该将查询重写为
SELECT table1.command AS Command, table2.raw AS Nginx, table3.raw AS Apache FROM Table1 LEFT JOIN table2 ON table2.datetime::timestamp BETWEEN table1.datetime::timestamp - interval '30 seconds' AND table1.datetime::timestamp + interval '30 seconds' LEFT JOIN table3 ON table3.datetime::timestamp BETWEEN table1.datetime::timestamp - interval '30 seconds' AND table1.datetime::timestamp + interval '30 seconds';
确保
datetime
和table2
的table3
列上有索引。
除非您用table1
条件限制要从WHERE
中检索的行数,否则这可能仍然很慢。