如何从Little-Endian UTF-16编码字节中获取C ++ std :: string

Question

我有一个第三方设备，该设备通过未经充分证明的专有通信协议与Linux机器进行通信。一些数据包传达的“字符串”在读取this Joel On Software article之后似乎采用UTF16 Little-Endian编码。换句话说，收到这样的数据包后，我在Linux机器上的东西是

// The string "Out"
unsigned char data1[] = {0x4f, 0x00, 0x75, 0x00, 0x74, 0x00, 0x00, 0x00};

// The string "°F"
unsigned char data2[] = {0xb0, 0x00, 0x46, 0x00, 0x00, 0x00};

据我所知，我不能将它们视为std::wstring，因为在Linux上wchar_t是4个字节。但是，我确实有一件事情要办，因为我的Linux机器也是Little-Endian。因此，我相信我需要使用类似std::codecvt_utf8_utf16<char16_t>的名称。但是，即使阅读了the documentation，我也无法弄清楚如何真正地从unsigned char[]变为std::string。有人可以帮忙吗？

Answer 1

如果您想使用std :: codcvt（自C ++ 17起不推荐使用，则可以包装UTF-16文本，然后根据需要将其转换为UTF-8。

即

// simply cast raw data for constructor, since we known that char 
// is actually 'byte' array from network API
std::u16string u16_str( reinterpret_cast<const char16_t*>(data2) );

// UTF-16/char16_t to UTF-8
std::string u8_conv = std::wstring_convert<std::codecvt_utf8_utf16<char16_t>,char16_t>{}.to_bytes(u16_str);

如何从Little-Endian UTF-16编码字节中获取C ++ std :: string

问题描述投票：0回答：1

1个回答

最新问题

如何从Little-Endian UTF-16编码字节中获取C ++ std :: string

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1