例如,我从文件1中读取:
ch="Hello world, this a stackoverflow example"
我在文件2中写入Unicode UTF-16,输出必须是这样的:
output="\u0048\u0065\u006c\u006c\u006f \u0077\u006f\u0072\u006c\u0064\u002c \u0074\u0068\u0069\u0073 \u0061 \u0073\u0074\u0061\u0063\u006b\u006f\u0076\u0065\u0072 \u0066\u006c\u006f\u0077 \u0065\u0078\u0061\u006d\u0070\u006c\u0065"
我发现了如何转换或阅读,但没有找到如何转换
只需在您encoding="utf-16"
输出文件时传递open
:
ch="Hello world, this a stackoverflow example"
with open("utf_16.txt", "w", encoding="utf-16") as f:
f.write(ch)
$ file utf_16.txt
utf_16.txt: Little-endian UTF-16 Unicode text, with no line terminators
$ hexdump -Cv utf_16.txt
00000000 ff fe 48 00 65 00 6c 00 6c 00 6f 00 20 00 77 00 |..H.e.l.l.o. .w.|
00000010 6f 00 72 00 6c 00 64 00 2c 00 20 00 74 00 68 00 |o.r.l.d.,.
.t.h.|
...
请注意,utf-16
编码包括字节顺序标记(BOM)。如果您不希望这样做,请在编码名称中添加字节序(例如utf-16le
):
ch="Hello world, this a stackoverflow example"
with open("utf_16.txt", "w", encoding="utf-16le") as f:
f.write(ch)
$ file utf_16.txt
utf_16.txt: data
$ hexdump -Cv utf_16.txt
00000000 48 00 65 00 6c 00 6c 00 6f 00 20 00 77 00 6f 00 |H.e.l.l.o. .w.o.|
00000010 72 00 6c 00 64 00 2c 00 20 00 74 00 68 00 69 00 |r.l.d.,. .t.h.i.|
...