UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>

Question

我正在尝试让 Python 3 程序对充满信息的文本文件进行一些操作。但是，在尝试读取文件时出现以下错误：

Traceback (most recent call last):  
   File "SCRIPT LOCATION", line NUMBER, in <module>  
     text = file.read()
   File "C:\Python31\lib\encodings\cp1252.py", line 23, in decode  
     return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 2907500: character maps to `<undefined>`

Answer 1

有问题的文件未使用

CP1252

编码。它正在使用另一种编码。哪一个你必须自己弄清楚。常见的有

Latin-1

和

UTF-8

。由于 0x90 在

Latin-1

中实际上没有任何意义，所以

UTF-8

（其中 0x90 是一个连续字节）更有可能。

打开文件时指定编码：

file = open(filename, encoding="utf8")

Answer 2

如果

file = open(filename, encoding="utf-8")

不起作用，试试

file = open(filename, errors="ignore")

，如果你想删除不需要的字符。（文档）

Answer 3

或者，如果您不需要解码文件，例如将文件上传到网站，请使用：

open(filename, 'rb')

其中 r = reading, b = binary

Answer 4

作为@LennartRegebro 的回答的扩展：

如果您不知道您的文件使用什么编码并且上面的解决方案不起作用（它不是

utf8

）并且您发现自己只是在猜测 - 有在线工具可以用来识别它是什么编码。它们并不完美，但通常工作得很好。找出编码后，您应该可以使用上面的解决方案。

编辑：（从评论中复制）

一个非常流行的文本编辑器

Sublime Text

有一个命令来显示编码如果已经设置...

转到
```
View
```
->
```
Show Console
```
（或Ctrl+`）

在底部输入字段
```
view.encoding()
```
并希望最好（除了
```
Undefined
```
我什么也得不到，但也许你会有更好的运气......）

Answer 5

TLDR： 尝试：

file = open(filename, encoding='cp437')

为什么？使用时：

file = open(filename)
text = file.read()

Python 假设该文件使用与当前环境相同的代码页（

cp1252

在开帖的情况下）并尝试将其解码为自己的默认值

UTF-8

。如果文件包含此代码页中未定义值的字符（如 0x90），我们将得到

UnicodeDecodeError

。有时我们不知道文件的编码，有时文件的编码可能未被 Python 处理（例如

cp790

），有时文件可能包含混合编码。

如果不需要这些字符，可以决定用问号替换它们，即：

file = open(filename, errors='replace')

另一个解决方法是使用：

file = open(filename, errors='ignore')

然后字符保持原样，但其他错误也会被掩盖。

一个很好的解决方案是指定编码，但不是任何编码（如

cp1252

），而是定义了所有字符的编码（如

cp437

）：

file = open(filename, encoding='cp437')

Codepage 437是原始的DOS编码。所有代码都已定义，因此在读取文件时没有错误，没有错误被屏蔽，字符被保留（不是很完整，但仍然可以区分）。

Answer 6

停止浪费你的时间，只需在读写代码中添加以下

encoding="cp437"

和

errors='ignore'

：

open('filename.csv', encoding="cp437", errors='ignore')
open(file_name, 'w', newline='', encoding="cp437", errors='ignore')

神速

Answer 7

def read_files(file_path):

    with open(file_path, encoding='utf8') as f:
        text = f.read()
        return text

或（和）

def read_files(text, file_path):

    with open(file_path, 'rb') as f:
        f.write(text.encode('utf8', 'ignore'))

或

document = Document()
document.add_heading(file_path.name, 0)
    file_path.read_text(encoding='UTF-8'))
        file_content = file_path.read_text(encoding='UTF-8')
        document.add_paragraph(file_content)

或

def read_text_from_file(cale_fisier):
    text = cale_fisier.read_text(encoding='UTF-8')
    print("what I read: ", text)
    return text # return written text

def save_text_into_file(cale_fisier, text):
    f = open(cale_fisier, "w", encoding = 'utf-8') # open file
    print("Ce am scris: ", text)
    f.write(text) # write the content to the file

或

def read_text_from_file(file_path):
    with open(file_path, encoding='utf8', errors='ignore') as f:
        text = f.read()
        return text # return written text


def write_to_file(text, file_path):
    with open(file_path, 'wb') as f:
        f.write(text.encode('utf8', 'ignore')) # write the content to the file

或

import os
import glob

def change_encoding(fname, from_encoding, to_encoding='utf-8') -> None:
    '''
    Read the file at path fname with its original encoding (from_encoding)
    and rewrites it with to_encoding.
    '''
    with open(fname, encoding=from_encoding) as f:
        text = f.read()

    with open(fname, 'w', encoding=to_encoding) as f:
        f.write(text)

Answer 8

对我来说，使用 utf16 编码有效

file = open('filename.csv', encoding="utf16")

Answer 9

对于那些在 Windows 中使用 Anaconda 的人，我遇到了同样的问题。 Notepad++ 帮我解决一下

在 Notepad++ 中打开文件。在右下角，它会告诉你当前的文件编码。在顶部菜单中，在“查看”旁边找到“编码”。在“编码”中转到“字符集”，然后耐心地寻找您需要的编码。在我的例子中，编码“Windows-1252”是在“Western European”下找到的

Answer 10

在应用建议的解决方案之前，您可以检查文件中（和错误日志中）出现的 Unicode 字符是什么，在这种情况下

0x90

：https://unicodelookup.com/#0x90/1 （或直接在 Unicode Consortium 网站 http://www.unicode.org/charts/ 通过搜索

0x0090

）

然后考虑将其从文件中删除

Answer 11

在较新版本的 Python（从 3.7 开始）中，您可以添加解释器选项

-Xutf8

，这应该可以解决您的问题。如果您使用 Pycharm，只需转到 Run > Edit configurations（在选项卡 Configuration change value in field Interpreter options to

-Xutf8

）。

或者，等效地，您可以将环境变量

PYTHONUTF8

设置为 1.

Answer 12

对我来说，更改与我的代码相同的 Mysql 字符编码有助于找出解决方案。

photo=open('pic3.png',encoding=latin1)

Answer 13

这是我如何使用 UTF-8 打开和关闭文件的示例，摘自最近的代码：

def traducere_v1_txt(translator, file):
  data = []
  with open(f"{base_path}/{file}" , "r" ,encoding='utf8', errors='ignore') as open_file:
    data = open_file.readlines()
    
    
file_name = file.replace(".html","")
        with open(f"Translated_Folder/{file_name}_{input_lang}.html","w", encoding='utf8') as htmlfile:
          htmlfile.write(lxml1)

UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>

问题描述投票：0回答：13

13个回答

最新问题

UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>

问题描述 投票：0回答：13

13个回答

最新问题

问题描述投票：0回答：13