用 Python 抓取文本文件

问题描述 投票:0回答:1

我有多个文本文件,文本格式完全相同命名为 st.txt

Combination :3   Tuple Number:3
Request Type:ADD
Firewall Type:JP
Firewall Policy Name :STI-CEP31

Rule Type: ALLOW

Requested Values:

Policy Name Suffix: 

Source IP: GRN

Source Groups:
10.151.2.0/24
10.151.1.0/24

Destination IP: Untrusted 
Destination Group:
169.176.39.0/24
169.176.38.0/24

Application(s):
Application Group:

Service Mode:Use Protocol/Ports
Service Group:

Protocol/Ports:

TCP      |       21099


Combination :5   Tuple Number:16
Request Type:ADD
Firewall Type:JP
Firewall Policy Name :STI-CEP31

Rule Type: ALLOW

Requested Values:

Policy Name Suffix: 

Source IP: GRN

Source Groups:
10.151.2.0/24
10.151.1.0/24


Destination IP: Untrusted 
Destination Group:
169.176.39.0/24
169.176.38.0/24
154.154.55.221
154.25.55.662
148.55.465.653

Application(s):
Application Group:

Service Mode:Use Protocol/Ports
Service Group:

Protocol/Ports:

TCP      |       219


Combination :100   Tuple Number:100
Request Type:ADD
Firewall Type:JP
Firewall Policy Name :STI-CEP31

Rule Type: ALLOW

Requested Values:

Policy Name Suffix: 

Source IP: GRN

Source Groups:
10.151.2.0/24
10.151.1.0/24

Destination IP: Untrusted 
Destination Group:
169.176.38.0/24
154.154.55.222
154.25.55.61
148.55.465.651

Application(s):
Application Group:

Service Mode:Use Protocol/Ports
Service Group:

Protocol/Ports:

TCP      |       210

我正在尝试创建一个 python 代码,它将以 Combination 开头的每一行视为一个块,例如:此文本文件中有三个块(一个以 Combination 开头:3,一个以 Combination 开头:5 和一个以组合:100)。在每个块中搜索 IP“169.176.39.0/24”,如果找到条目(例如:在以 3 和 5 开头的块中,在目标组部分中有 169.176.39.0/24 的条目),然后打印源组 IP,目标组:IP 和端口。

所以例如从这个文件中,输出应该看起来像

Combination :3   Tuple Number:3 # Source Groups: 10.151.2.0/24 10.151.1.0/24 # Destination Group: 169.176.39.0/24 169.176.38.0/24 # Ports # 21099

Combination :5   Tuple Number:16 # Source Groups: 10.151.2.0/24 10.151.1.0/24 # Destination Group: 169.176.39.0/24 169.176.38.0/24 154.154.55.221 154.25.55.662 148.55.465.653 # Ports # 219

我试过的代码只打印目标组 IP,但我无法将组合添加到列表中。

我试图拉出目标组 IP 的代码

    data = f.read()

# Split data into blocks
blocks = data.split('\n\n')

# Extract desired information from each block
for block in blocks:

    if "169.176.39.0/24" in block:
        lines = block.split('\n\n')
        for i, line in enumerate(lines):
            if "Destination Group:" in line:
                start = i + 1
                break
        result = " ".join(lines[start:]).strip()
        print(lines[0], result)

代码我试图从文件中提取组合数和元组数

import os

directory = os.curdir
search_string = "169.176.39.0/24"

for filename in os.listdir(directory):
    filepath = os.path.join(directory, filename)
    if os.path.isfile(filepath):
        with open(filepath, 'r') as file:
            header = ''
            for line in file:
                if line.startswith("Combination"):
                    header = line.strip()
                elif search_string in line:
                    o1 = header
                    print(f"{filename}#{o1}")
python extract text-parsing
1个回答
0
投票

文本文件的格式似乎不是这样的,你可以把它分成有意义的块,每个“ “.

Source Groups
,
Destination Group
,
Protocol/Ports
都是值跨越多行的属性。我假设属性名称下面的每一行都是一个值,除非它是空的或包含一个冒号,这表明它是另一个单行键:值对(被简单地忽略)。

我假设一个块开始/结束于以

Combination
或文件末尾开头的行。

在下面的代码中,我创建了一个包含所有块的字典列表。之后,我创建了另一个字典,其中仅包含包含特定 IP 地址的块。最后,块被格式化和打印。

filename = r"C:\Users\Desktop\Todd Bonzalez\st.txt"

blocks = []
attributes = ("Source Groups", "Destination Group", "Protocol/Ports")

with open(filename) as f:
    block = {}
    in_section = None
    for line in f:
        line = line.strip()
        if line:
            # Stop consuming values for multi-line attributes if the line 
            # contains a colon. It's assumed to be another single-line 
            # attribute:value pair.
            if ":" in line:
                in_section = None
            
            # The end of a block starts/ends at a line starting with 
            # "Combination" or the end of the file
            if line.startswith("Combination"):
                if block:
                    blocks.append(block)
                    block = {}
                
                block["block"] = line
                
            # Consume the line since we're in a multi-line attribute section
            elif in_section:
                values = block.setdefault(in_section, [])
                
                # We only want the port
                if in_section == "Protocol/Ports":
                    line = line.split("|", maxsplit=2)[1].strip()
                
                values.append(line)
                
            # Check if the line is the start of a multi-line attribute
            else:
                for attribute in attributes:
                    if line.startswith(attribute):
                        in_section = attribute
                        break

# The end of a block starts/ends at a line starting with 
# "Combination" or the end of the file
if block:
    blocks.append(block)

# Create a new list of blocks if it contains a particular IP address
blocks_with_certain_ip = []
for block in blocks:
    search_string = "169.176.39.0/24"
    if search_string in block["Source Groups"] or search_string in block["Destination Group"]:
        blocks_with_certain_ip.append(block)

# Format and print the blocks as desired
for block in blocks_with_certain_ip:
    string = (f'{block["block"]} '
              f'# Source Groups: {" ".join(block["Source Groups"])} '
              f'# Destination Group: {" ".join(block["Destination Group"])} '
              f'# Ports # {" ".join(block["Protocol/Ports"])}')
    print(string)

输出:

Combination :3   Tuple Number:3 # Source Groups: 10.151.2.0/24 10.151.1.0/24 # Destination Group: 169.176.39.0/24 169.176.38.0/24 # Ports # 21099
Combination :5   Tuple Number:16 # Source Groups: 10.151.2.0/24 10.151.1.0/24 # Destination Group: 169.176.39.0/24 169.176.38.0/24 154.154.55.221 154.25.55.662 148.55.465.653 # Ports # 219
© www.soinside.com 2019 - 2024. All rights reserved.