我想要一个应该打印文本文件各部分的python程序。该部分由从单词列表中找到的关键字定义,并从该关键字所在的行开始,并在下一部分开始的那一行结束。例如考虑以下文本文件
word1
abcdef
ghis jsd sjdhd jshj
word2
dgjgj dhkjhf
khkhkjd
word23
dfjkg fjidkfh
word5
diow299 udhgbhdi
jkdkjd
word89
eyuiywiou299092
word3
...
...
...
程序的必需输出是:
Sections Found: [word1, word2, word3, word5, word89]
**********word1--SECTION**********
line 1: word1
line 2: abcdef
line 3: ghis jsd sjdhd jshj
**********word2--SECTION**********
line 4: word2
line 5: dgjgj dhkjhf
line 6: khkhkjd
**********word3--SECTION**********
line 14: word 3
line 15: ....
''' Suppose word4 is not found in the txt file then it should continue and move to next word found'''
**********word5--SECTION**********
line 9: word5
line 10: diow299 udhgbhdi
line 11: jkdkjd
...
...
...
...
'''Continue till the end of list of words '''
list_of_words = ['word1','word2','word3','word4','word5','word6',....]
在list_of_word中找到每个单词的起始行并将它们存储在列表中
然后通过对列表进行排序来找到每个单词的end_line,以便轻松找到单词的最大近端行
然后打印找到的部分及其行号:line_in_text_file
用于获取行号的代码:(如何为list_of_words中的每个n创建变量)
for n in list_of_words:
with open(file_txt, 'r', encoding="utf8") as f:
data_file = f.readlines()
for num, lines in enumerate(data_file, 1):
if n in lines:
start_line = num
else:
continue
用于查找最接近起始行列表n_start_line(val)的数字的代码:
def closest(array_list, val):
array_list1 = [j for j in array_list if j > val]
array_list1.sort()
return array_list1[0]
pyparsing具有生成器函数scanString
,它将生成匹配的令牌以及匹配的开始和结束位置。使用起始位置,调用pyparsing的lineno
方法以获取匹配的行号。
import pyparsing as pp
marker = pp.oneOf("word1 word2 word3 word4 word5 word23")
txt = """\
word1
abcdef
ghis jsd sjdhd jshj
word2
dgjgj dhkjhf
khkhkjd
word23
dfjkg fjidkfh
word5
diow299 udhgbhdi word2
jkdkjd
word89
eyuiywiou299092
word3
"""
previous = None
for t, s, e in (pp.LineStart() + marker | pp.StringEnd()).scanString(txt):
current_line_number = pp.lineno(s, txt)
if t:
current = t[0]
if previous is not None:
print(previous, "ended on line", current_line_number - 1)
print("found", current, "on line", current_line_number)
previous = current
else:
if previous is not None:
print(previous, "ended on line", current_line_number)
打印:
found word1 on line 1
word1 ended on line 3
found word2 on line 4
word2 ended on line 6
found word23 on line 7
word23 ended on line 8
found word5 on line 9
word5 ended on line 13
found word3 on line 14
word3 ended on line 15
您应该可以从这里拿走它。