当我这样读,有些文件
list_of_files = glob.glob('./*.txt') # create the list of files
for file_name in list_of_files:
FI = open(file_name, 'r', encoding='cp1252')
错误:
UnicodeDecodeError:'charmap'编解码器无法解码位置1260中的字节0x9d:字符映射到
当我切换到这个
list_of_files = glob.glob('./*.txt') # create the list of files
for file_name in list_of_files:
FI = open(file_name, 'r', encoding="utf-8")
错误:
UnicodeDecodeError:'utf-8'编解码器无法解码位置1459中的字节0x92:无效的起始字节
我已经读过,我应该将其打开为二进制文件。但我不知道该怎么做。这是我的功能:
def readingAndAddToList():
list_of_files = glob.glob('./*.txt') # create the list of files
for file_name in list_of_files:
FI = open(file_name, 'r', encoding="utf-8")
stext = textProcessing(FI.read())# split returns a list of words delimited by sequences of whitespace (including tabs, newlines, etc, like re's \s)
secondaryWord_list = stext.split()
word_list.extend(secondaryWord_list) # Add words to main list
print("Lungimea fisierului ",FI.name," este de", len(secondaryWord_list), "caractere")
sortingAndNumberOfApparitions(secondaryWord_list)
FI.close()
只是我的功能开始很重要,因为我在阅读部分得到了错误
如果您在Windows上,请在NotePad中打开该文件并保存所需的编码。在Linux中,在文本编辑器中也一样。希望你的程序运行。