我正在运行一个代码,它接受csv的每一行,并在目录的每个文件中找到实体的完全匹配。这里的问题是代码在打印出四个文件的匹配值后终止,而目录中有5K个文件。我认为问题在于我的休息或继续声明。有人可以帮我这个。代码到现在为止:
import csv
import os
import re
path = 'C:\\Users\\Lenovo\\.spyder-py3\\5KFILES\\'
with open('C:\\Users\\Lenovo\\.spyder-py3\\codes_file.csv', newline='', encoding ='utf-8') as myFile:
reader = csv.reader(myFile)
for filenames in os.listdir(path):
with open(os.path.join(path, filenames), encoding = 'utf-8') as my:
content = my.read().lower()
#print(content)
for row in reader:
if len(row[1])>=4:
#v = re.search(r'(?<!\w){}(?!\w)'.format(re.escape(row[1])), content, re.I)
v = re.search(r'\b' + re.escape(row[1]) + r'\b', content, re.IGNORECASE)
if v:
print(filenames,v.group(0))
break
reader
是在你的for
循环之前创建的,它是一个迭代器。每当你到达for
线时,迭代将在它停止的地方继续。一旦到达reader
的末尾,下一个for
循环将是空循环。
您可以看到这个简短示例中发生的情况:
l = [0, 1, 2, 3, 4, 5]
iterator = iter(l)
for i in range(0, 16, 2):
print('i:', i, "- starting the 'for j ...' loop")
for j in iterator:
print('iterator:', j)
if j == i:
break
i: 0 - starting the 'for j ...' loop
iterator: 0
i: 2 - starting the 'for j ...' loop
iterator: 1
iterator: 2
i: 4 - starting the 'for j ...' loop
iterator: 3
iterator: 4
i: 6 starting the 'for j ...' loop
iterator: 5
i: 8 starting the 'for j ...' loop
i: 10 starting the 'for j ...' loop
i: 12 starting the 'for j ...' loop
i: 14 starting the 'for j ...' loop
每次for
循环执行时,它继续在iterator
上迭代,之前它已经停止过。迭代器耗尽后,for j...
循环为空。
您应该在每个循环上重新启动它:
for row in csv.reader(myFile):
....
或列出一个清单:
reader = list(csv.reader(myFile))
....
for row in reader:
....