“é”来自什么字符集? (Python:带“é”的文件名,如何使用os.path.exists,filecmp.cmp,shutil.move?)

问题描述 投票:-1回答:1

é来自什么字符集?在Windows记事本中,在ANSI文本文件中包含此字符可以节省罚款。插入😍之类的内容,将会出现错误。 é似乎在Putty的ASCII终端(CP437和IBM437是否相同?)中工作正常,而😍则不行。

我可以看到😍是Unicode,而不是ASCII。但是什么是é?它不会产生我在记事本中使用Unicode时遇到的错误,但是在我按SyntaxError: Non-ASCII character '\xc3' in file on line , but no encoding declared;的建议添加“魔术注释”之前,Python抛出了Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP)

我添加了“魔术注释”,没有得到该错误,但是os.path.isfile()表示不存在带有é的文件名。具有讽刺意味的是,字符é位于Marc-André Lemburg中,错误链接到该PEP的作者。

编辑:如果我打印文件的路径,带重音符号的e显示为├⌐,但是我可以将é复制并粘贴到命令提示符中。

EDIT2:参见下文

Private    > cat scratch.py   ### LOL cat scratch :3
# coding=utf-8
file_name = r"Filéname"
file_name = unicode(file_name)
Private    > python scratch.py
Traceback (most recent call last):
  File "scratch.py", line 3, in <module>
    file_name = unicode(file_name)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
Private    >

EDIT3:

Private    > PS1="Private    > " ; echo code below ; cat scratch.py ; echo =======  ; echo output below ; python scratch.py
code below
# -*- coding: utf-8 -*-

file_name = r"Filéname"
file_name = unicode(file_name, encoding="utf-8")

# I have code here to determine a path depending on the hostname of the
# machine, the folder paths contain no Unicode characters, for my debug
# version of the script, I will hardcode the redacted hostname.
hostname = "One"
if hostname == "One":
    folder = "C:/path/folder_one"
elif hostname == "Two":
    folder = "C:/path/folder_two"
else:
    folder = "C:/path/folder_three"

path = "%s/%s" % (folder, file_name)
path = unicode(path, encoding="utf-8")


print path
=======
output below
Traceback (most recent call last):
  File "scratch.py", line 18, in <module>
    path = unicode(path, encoding="utf-8")
TypeError: decoding Unicode is not supported
Private    >

é来自什么字符集?在Windows记事本中,在ANSI文本文件中包含此字符可以节省罚款。插入诸如😍之类的内容,将会出现错误。 é似乎在ASCII终端中的工作正常,...

python ascii non-ascii-characters character-set
1个回答
0
投票

[您需要告诉unicode字符串是什么编码,在这种情况下,它是utf-8而不是ascii,文件头应该是# -*- coding: utf-8 -*-Encoding Declarations

© www.soinside.com 2019 - 2024. All rights reserved.