如何使用Python正则表达式处理Zookeeper日志文件?

问题描述 投票:2回答:2

我已经得到了如下所示的动物园管理员日志:

2019-09-25 11:16:39,253 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
        at java.lang.Thread.run(Thread.java:745)
2019-09-25 11:16:39,260 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002
2019-09-25 11:16:40,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded

我正在尝试获得以下结果:

log entry 1:
2019-09-25 11:16:39,253 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
        at java.lang.Thread.run(Thread.java:745)

log entry 2:
2019-09-25 11:16:39,260 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002

log entry 3:
2019-09-25 11:16:39,260 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002
2019-09-25 11:16:40,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded

我尝试使用以下正则表达式模式:

import re

content = "2019-09-25 11:16:39,253 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception\n \
EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket\n \
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)\n \
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)\n \
        at java.lang.Thread.run(Thread.java:745)\n \
2019-09-25 11:16:39,260 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002\n \
2019-09-25 11:16:40,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded\n \
"

pattern = re.compile("(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}.*)+",re.DOTALL|re.MULTILINE)

match = re.match(pattern, content)
for f in match.groups():
    print(f,"\nEND")

但它匹配了全部内容:

2019-09-25 11:16:39,253 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
 EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket
         at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
         at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
         at java.lang.Thread.run(Thread.java:745)
 2019-09-25 11:16:39,260 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002
 2019-09-25 11:16:40,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded

END

有人知道如何解决此问题吗?不胜感激!

python regex logging apache-zookeeper
2个回答
0
投票

这里是您尝试的工作版本,稍作修改:

content = """2019-09-25 11:16:39,253 [myid:] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception\n \
EndOfStreamException: Unable to read additional data from client sessionid 
0x16d666b95e10002, likely client has closed socket\n \
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)\n \
    at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)\n \
    at java.lang.Thread.run(Thread.java:745)\n \
2019-09-25 11:16:39,260 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002\n \
2019-09-25 11:16:40,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded\n \
"""

logs = re.findall(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} \[.*?\] - (?:TRACE|DEBUG|INFO|WARN|ERROR).*?(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} \[.*?\] - (?:TRACE|DEBUG|INFO|WARN|ERROR)|$)', content, flags=re.DOTALL)
print(logs)

此打印:

['2019-09-25 11:16:39,253 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception\n EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket\n         at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)\n         at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)\n         at java.lang.Thread.run(Thread.java:745)\n ',
 '2019-09-25 11:16:39,260 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002\n ',
 '2019-09-25 11:16:40,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded\n ']

这里使用的正则表达式逻辑将日志行条目的开始定义为时间戳,后跟短划线和状态之一(即TRACEDEBUGINFOWARNERROR)。模式在点所有模式下跨行使用.*进行匹配,直到命中另一个日志条目的开头或输入的结尾。


0
投票

您可以尝试以下正则表达式:

\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3}(?:(?!\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3})[\s\S])*

Click for Demo

说明:

  • [\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3}-匹配模式XXXX-XX-XX XX:XX:XX,XXX的时间戳,其中X是数字
  • [(?:(?!\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3})[\s\S])*-匹配0+次出现的任何字符,只要它不以上面指针1中提到的格式的另一个时间戳记开头。]]
  • 您可以找到working Python代码here

© www.soinside.com 2019 - 2024. All rights reserved.