如何消除Excel加载的CSV中的Unicode字符乱码？

Question

我有一组可能包含特殊字符的 CSV 文件。当我在文本编辑器中打开这些文件时，它们显示得很顺利。但是，当我在 Excel 中打开这些文件时，它会将特殊字符解释为 Windows 版本的 ANSI（我认为），并将它们转换为 2-3 个不同的字符。请参阅下图了解各种 unicode 字符（以表格格式显示，其代码点由左列和顶行指定）转换为什么。

如何将这些字符转换回原始字符？我可以使用公式或 VBA。

注意：我已经看到一些答案，其中在 CSV 文件中添加 ZWNBSP 可以修复该问题。此解决方案的问题在于我从外部源接收这些文件，因此我（或我的用户）每次都必须手动添加字符。还有其他办法吗？

谢谢！

我尝试过使用StrConv函数，但似乎不起作用。

Sub MyTest()
    Dim s0 As String
    Dim s1 As String
    Dim s2 As String

    s0 = ChrW(192)                  ' Correct character: À
    '                               ' This would appear normally when opening
    '                               ' the file in a text editor
    Debug.Print s0

    s1 = ChrW(195) & ChrW(8364)     ' ANSI representation of s0: Ã€
    '                               ' This is what would appear when opening
    '                               ' the file in Excel
    Debug.Print s1

    s2 = StrConv(s1, vbFromUnicode) ' Returns ?
    Debug.Print s2

    s2 = StrConv(s1, vbUnicode)     ' Returns Ã ¬
    Debug.Print s2
End Sub

Answer 1

数据似乎发生的情况是，它以 UTF-8 编码，但被解释为代码页 1252（Microsoft 的 ISO 8859-1 扩展版本），例如代码点 192 被编码为 UTF-8 字节序列

[195, 128]

，然后将其解码为代码页 1252 为代码点

[195, 8364]

。

这应该可以通过颠倒编码和解码步骤来解决。

Python 示例：

>>> original = chr(192)
>>> print(original)
À
>>> broken = original.encode("utf-8").decode("cp1252")
>>> print(broken)
'Ã€'
>>> fixed = broken.encode("cp1252").decode("utf-8")
>>> fixed
'À'
>>> fixed == original
True

我目前没有 Windows 机器可供测试，但本页介绍了如何使用

MultiByteToWideChar

中的

kernel32

函数并指定代码页 65001（即UTF-8）： https://di-mgt.com.au/howto-convert-vba-unicode-to-utf8.html

如何消除Excel加载的CSV中的Unicode字符乱码？

问题描述投票：0回答：1

1个回答

最新问题

如何消除Excel加载的CSV中的Unicode字符乱码？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1