根据Python中的关键字列表将大文本文件拆分为多个文件

问题描述 投票:0回答:1

我是Python新手。我正忙着做作业。我正在尝试根据关键字列表将 10,000 行文本文件拆分为多个文件。

input.txt 看起来像这样:

 Name: Apple
 Type: Fruits
 Description:...

 Name: Orange
 Type: Fruits
 Description:...

 Name: Yellow
 Type: Colour
 Description:...

 Name: Apple
 Type: Fruits
 Description:...

 Name: Orange
 Type: Fruits
 Description:...

 Name: Yellow
 Type: Colour
 Description:...
 

关键词:

Apple
Orange
Yellow

预期输出文件:

苹果.txt

 Type: Fruits
 Description:

0范围.txt

 Type: Fruits
 Description:

黄色.txt

 Type: Colour
 Description:

但是我当前的代码只能在密钥是“Apple”时才能拆分。我不知道如何将其修改为一系列关键字。

key = ['Apple']

outfile = None
fno = 0
lno = 0

with open('input.txt') as infile:
    while line := infile.readline():
        lno += 1
        if outfile is None:
            fno += 1
            outfile = open(f'{fno}.txt', 'w')
        outfile.write(line)
        
        if key in line:
            print(f'"{key}" found in line {lno}')
            outfile.close()
            outfile = None
if outfile:
    outfile.close()

编辑:它应该打印每个关键字的第一条记录。

python split keyword
1个回答
0
投票

这是您的代码的更惯用的版本。它不会对关键字列表进行硬编码;它只是简单地拾取

Name:

之后的内容
seen = set()
outfile = None

with open('input.txt') as infile:
    for line in infile:
        if line.startswith(' Name: '):
            keyword = line[len(' Name: '):-1]
            if keyword not in seen:
                outfile = open(f'{keyword}.txt', 'w')
                seen.add(keyword)
        if outfile is not None:
            if line.strip() == '':
                outfile.close()
                outfile = None
            else:
                outfile.write(line)
if outfile is not None:
    outfile.close()

你从来没有用

lno
做过任何有用的事情,但如果你出于某种原因想要它,获取行号的惯用方法是

    for lno, line in enumerate(infile, start=1):

您的示例

input.txt
在每行的开头显示一个空格。如果转录不正确,显然要进行相应的调整。

© www.soinside.com 2019 - 2024. All rights reserved.