我有两个制表符的字符串
# File contains multiple lines like this
'T1 Original 210 227 Extra Mile'
'T8 Modified 1646 1655 Tickets'
# Eg: "Tx" "indication" "start_index" "end_index" "word"
# 'T1\tOriginal 210 227\tExtra Mile'
我希望第二个标签后面的单词。所以我试图找到'\ t'的索引并将初始字符串替换为空。
def find_index(s, ch):
return [i for i, ltr in enumerate(s) if ltr == ch]
def extract_words(filename):
extracted_data = [line.rstrip('\n') for line in open(filename)]
search_key = '\t'
for i in range(len(extracted_data)):
indices = find_index(extracted_data[i], search_key)
extracted_data[i] = extracted_data[i].replace(extracted_data[i][:indices[-1]], '')
return extracted_data
但它没有标识'\ t',因为索引输出是[]。是什么导致了这个问题?
预期的产出
'Extra Mile'
'Tickets'
你的一些行不包含标签 - 因此没有索引,因此IndexError
。使用:
if len(indices)>1: # only extract by slicing if indexes found!
检查一下。
为什么这么复杂?使用str.split("\t")
:
def extract_words(filename):
with open(filename) as f:
lines = [x.strip() for x in f.readlines()]
k = []
for l in lines:
try:
k.append(l.split("\t")[2])
except IndexError:
print (f"no 2 tabs in '{l}'")
return k
t = """T1\tOriginal 210 227\tExtra Mile
T8\tModified 1646 1655\tTickets
Error\ttext"""
fn = "t.txt"
with open(fn,"w") as f:
f.write(t)
print(*extract_words(fn), sep="\n")
输出:
no 2 tabs in 'Error text'
Extra Mile
Tickets
这将适用于包含2个选项卡的行,并报告任何没有这些选项卡的行。