遍历 Unicode 字符

Question

我想像这样在 Python 中循环 Unicode 字符：

hex_list = "012346789abcdef"
for _1 in hex_list:
    for _2 in hex_list:
        for _3 in hex_list:
            for _4 in hex_list:
                my_char = r"\u" + _1 + _2 + _3 + _4
                print(my_char)

正如预期的那样打印出来：

\u0000
\u0001
...
\uffff

然后我尝试更改上面的代码以不打印 Unicode 而是打印相应的字符：

hex_list = "012346789abcdef"
for _1 in hex_list:
    for _2 in hex_list:
        for _3 in hex_list:
            for _4 in hex_list:
                my_char = r"\u" + _1 + _2 + _3 + _4
                eval("print(my_char)")

但这输出与之前的代码相同。

hex_list = "012346789abcdef"
for _1 in hex_list:
    for _2 in hex_list:
        for _3 in hex_list:
            for _4 in hex_list:
                eval("print(" + r"\u" + f"{_1}{_2}{_3}{_4})")

像这样的事情会引发以下错误消息：

eval("print(" + r"\u" + f"{_1}{_2}{_3}{_4})")
  File "<string>", line 1
    print(\u0000)
                ^
SyntaxError: unexpected character after line continuation character

什么能让这段代码按预期工作？

Answer 1

Python 字符串已经是 Unicode。 Unicode 不是某种转义序列，它是一种将字符映射到字节的方式。

鉴于这一事实，您可以使用 chr 将 Unicode 代码点转换为具有该字符的字符串，例如

print(chr(1081))

。正如函数的文档所说：

返回表示 Unicode 代码点为整数 i 的字符的字符串。例如，chr(97) 返回字符串 'a'，而 chr(8364) 返回字符串 '€'。这是 ord() 的倒数。

参数的有效范围是从 0 到 1,114,111

一个简单的循环就可以生成所有有效字符。实际上打印它们是另一回事：

for i in range(0, 1114110 ):
    print(chr(i))

在我的机器上运行它最终失败了

UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed

该值无法转换为可以在我的终端上打印的形式

Answer 2

我建议在这种情况下使用

itertools

，例如，

from itertools import product

for comb in product("012346789abcdef", repeat=4):
    print(rf"\u{''.join(comb)}")

遍历 Unicode 字符

问题描述投票：0回答：2

2个回答

最新问题

遍历 Unicode 字符

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2