说朋友,
我正在构建一个非常简单的解码器,但我对 Python 还很陌生,我不知道为什么它会将“tag_line”(一个列表)的值写入文件“filen_e”两次。 列表“tags_line 应该用于捕获解码器不知道的单词,并将它们存储在单独的文件中。
这是整个编码器功能:
# the encoder function ;; filen_e is the file that stores words that aren't in the dictionary
def encoder(lst, filen=dictionary, filen_e=filen_e, verbose=False): # wordlist / dictionary ######## >>>> NEEDS FIXING !!!!!!
encoded = []
counter = 1
counter2 = 0
tags = []
tags_nrs = []
tags_line = []
score = False
check = False
with open(filen, "r", encoding="utf-8") as f:
comprehension = f.read().splitlines()
if verbose == True:
print(f"Type comprehension: {type(comprehension)}{comprehension}\n")
for i in range(len(lst)):
for j in range(len(comprehension)):
if lst[i] == comprehension[j]:
# match
encoded.append(counter)
counter = 1
score = True
break
if counter >= len(comprehension):
counter = 1
print(f"- appending: 0")
encoded.append(0) # in case there is no match
print(f"- appending: {i},{lst[i]}")
tags_line.append([i, lst[i]]) # for storing in file "filen_e"
break
counter += 1
counter2 += 1
f.close()
if counter2 > 1:
s = "s."
else:
s = "."
if verbose == True:
# output for the function
print(f"+(Encoder)")
print(f"Processed: ({counter2}) token{s}")
print(f"String: {lst}")
print(f"Encoded string: {encoded}\n")
logmsg = f"encoder({counter2}) :: 'Processed: ({counter2}) token{s}'"
log(logmsg)
if check==True:
logmsg = f"encoder({counter2}) :: 'Can not process: ({tags}) token{s}'"
log(logmsg)
with open(filen_e, "a", encoding="utf-8") as f:
# tags_line = f"tags: {tags_nrs}:{tags},"
print(f"- unknown tags: {tags_line}")
f.write(str(tags_line))
tags_line = []
f.close()
return encoded
它记录的内容在程序的输出中返回为:
+(beta_encoder):
- appending: 0
- appending: 2,homia
- appending: 0
- appending: 4,ducken
- appending: 0
- appending: 6,pls
- unknown tags: [[2, 'homia'], [4, 'ducken'], [6, 'pls']]
+(encode).reverse encoding
[6195, 5879, 0, 6085, 0, 3700, 0]
最后一个字符串是编码字符串,其中数字 0 是 3 倍,这表示它捕获了一个它不知道的单词,如 - 未知标签所示:[[2, 'homia'] 等。
然后文件“filen_e”的内容(这些未知标签应该被写入其中)具有以下内容:
[[2, 'homia'], [4, 'ducken'], [6, 'pls']][[2, 'homia'], [4, 'ducken'], [6, 'pls']]
ChatGPT 告诉我应该在函数开头清除tags_line 的值,但情况已经如此,因为我在函数开头将其初始化为空列表。
这实在是令人费解。
注意:文件中写的是“a”<- append mode, but also when I delete the file it writes it twice. I even reinitialize it directly after it is written, so it can't be the second call to the encoder function.
此外,“编码器”函数仅在我的其他“引导”函数中调用一次。
### bootstrap function ### next encode some text ###
encoded_seq = encode(lst=inputs.split(' '), fwd=False, verbose=True)
print("\n", encoded_seq, "\n")
### FUNCTIONS ###
没关系,我觉得真的很愚蠢,当你将“编码器”设置为 True 时,我的其他函数使用相同的编码器函数。
def store(txt, encoders=True, filen=filename, endl=storeUsingNewline):
### as such... duh moment