我正在尝试通过导入消息示例来训练聊天机器人,但它给了我一个 unicode 错误

问题描述 投票:0回答:0

这个模块应该过滤和清理文本,但是我一直无法让它发挥作用。

> chat_export_file = "C:\\Users\\User\\OneDrive\\Documents\\chatty.txt"
> def remove_chat_metadata(chat_export_file):
> pattern = r"(\d+\/\d+\/\d+,\s\d+:\d+)\s-\s([\w\s]+):\s"
> 
> with open(chat_export_file, "r") as corpus_file:
> content = corpus_file.read()
> cleaned_corpus = re.sub(pattern, "", content)
> return tuple(cleaned_corpus.split("\n"))
> 
> def clean_corpus(chat_export_file):
> message_corpus = remove_chat_metadata(chat_export_file)
> cleaned_corpus = remove_non_message_text(message_corpus)
> return cleaned_corpus
> cleaned_corpus = clean_corpus(chat_export_file)
> print(cleaned_corpus)

我期待它能清理和过滤文本,但它只是给我这个错误:

> Traceback (most recent call last):
> File "C:\Users\User\AppData\Local\Programs\Python\Python310\cleaner.py", line 25, in <module>
> cleaned_corpus = clean_corpus(chat_export_file)
> File "C:\Users\User\AppData\Local\Programs\Python\Python310\cleaner.py", line 22, in clean_corpus
> message_corpus = remove_chat_metadata(chat_export_file)
> File "C:\Users\User\AppData\Local\Programs\Python\Python310\cleaner.py", line 17, in remove_chat_metadata
> content = corpus_file.read()
> File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 23, in decode
> return codecs.charmap_decode(input,self.errors,decoding_table)[0]
> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 8545: character maps to <undefined>

我不知道这个错误是什么意思或可能导致它的原因,任何帮助将不胜感激,谢谢!

python chatbot python-unicode
© www.soinside.com 2019 - 2024. All rights reserved.