使用字典来清理域名的Python正则表达式替换

问题描述 投票:0回答:1

对于输出,需要将括号内的数字替换为句点“。”。还要删除域开头和结尾的括号。

我们可以为此使用re.sub吗?如果可以,如何使用?

code

import re

log = ["4/19/2020 11:59:09 PM 2604 PACKET  0000014DE1921330 UDP Rcv 192.168.1.28   f975   Q [0001   D   NOERROR] A      (7)pagead2(17)googlesyndication(3)com(0)",
       "4/19/2020 11:59:09 PM 0574 PACKET  0000014DE18C4720 UDP R cv 192.168.2.54    9c63   Q [0001   D   NOERROR] A      (2)pg(3)cdn(5)viber(3)com(0)"]

rx_dict = { 'query': re.compile(r'(?P<query>[\S]*)$') }

for item in log:
    for key, r_exp in rx_dict.items():
        print(f"{r_exp.search(item).group(1)}")

输出

(7)pagead2(17)googlesyndication(3)com(0)
(2)pg(3)cdn(5)viber(3)com(0)

首选输出

pagead2.googlesyndication.com
pg.cdn.viber.com
python regex
1个回答
0
投票

实用的python用法:

log = ["4/19/2020 11:59:09 PM 2604 PACKET  0000014DE1921330 UDP Rcv 192.168.1.28   f975   Q [0001   D   NOERROR] A      (7)pagead2(17)googlesyndication(3)com(0)",
       "4/19/2020 11:59:09 PM 0574 PACKET  0000014DE18C4720 UDP R cv 192.168.2.54    9c63   Q [0001   D   NOERROR] A      (2)pg(3)cdn(5)viber(3)com(0)"]

import re

urls = [re.sub(r'\(\d+\)','.',t.split()[-1]).lstrip('.') for t in log]

print (urls)

输出:

['pagead2.googlesyndication.com.', 'pg.cdn.viber.com.']
© www.soinside.com 2019 - 2024. All rights reserved.