我有一个巨大的文件,看起来像这样:
-HVC1 tank
Contains300gallons
-HVC2 tank
Contains20gallonsofgasand220galonsofkero
我读到的第二个文件列表如下所示:
s = [['-HVC1', '0', '8'], ['-HVC1', '12', '18'], ['-HVC2', '9', '17']]
我需要比较与给定行相关联的文件中每个字符的位置,例如-HVC1或-HVC2,以查看它是否与列表中的HVC1或HCV2等匹配。基于此,提取列表中其他2个值范围内的字符,例如0,8; 12,18; 9,17
此示例列表的预期结果是:
-HVC1
Contains
-HVC1
gallons
-HVC2
20gallons
我的代码:
import csv
sequence =[]
with open('my_huge_file', 'r') as f:
lines = f.readlines()
dic = {}
for line in lines:
if line.startswith('-'):
tx = line.split('tank', 1)[0] #include everything before tank in header
else:
gh = line[:-1]
dic[tx] = gh
s = [['-HVC1', '0', '8'], ['-HVC1', '12', '18'], ['-HVC2', '9', '17']]
for i in s:
seq =[]
for m, n in dic.items():
for j, k in enumerate(n):
if int(i[1]) <= j <= int(i[2]) and m == i[0]:
seq.append(k)
sequence.append(seq)
print(sequence)
我得到一个空的列表列表作为回报。
[[], [], [], []]
我知道我做错了什么但我的逻辑确实有意义。任何帮助将不胜感激(更好的解释)打印顺序的结果应该是:
[[Contains], [gallons], [20gallons]]
然后我将格式化为上面显示的预期结果
@ mkreiger1评论是正确的:在这种情况下,调试有很多帮助。
问题在于比较m == i[0]
:在第一次迭代中,m
是'-HVC1 '
,而i[0]
是'-HVC1'
。因此比较总是False
。解决方案是去除空白区域:
lines = ['-HVC1 tank', 'Contains300gallons', '-HVC2 tank',
'Contains20gallonsofgasand220galonsofkero']
sequence = []
dic = {}
for line in lines:
if line.startswith('-'):
tx = line.split('tank', 1)[0]
else:
gh = line[:-1]
# THE FIX IS HERE: Strip the white spaces in ``tx``
dic[tx.strip()] = gh
s = [['-HVC1', '0', '8'], ['-HVC1', '12', '18'], ['-HVC2', '9', '17']]
for i in s:
seq = []
for m, n in dic.items():
for j, k in enumerate(n):
if (int(i[1]) <= j <= int(i[2])) and (m == i[0]):
seq.append(k)
sequence.append(seq)
print(sequence)
输出:
[['C', 'o', 'n', 't', 'a', 'i', 'n', 's', '3'], ['a', 'l', 'l', 'o', 'n'], ['0', 'g', 'a', 'l', 'l', 'o', 'n', 's', 'o']]