我正在阅读C ++ std :: string,然后将该std :: string传递给将对其进行分析的函数,然后从中提取Unicode符号和简单ASCII符号。
我在网上搜索了许多教程,但是所有人都提到标准C ++并不完全支持Unicode格式。其中许多人提到使用ICU C ++。
这是我的C ++程序,用于了解上述功能的基本知识。它读取原始字符串,转换为ICU Unicode字符串并打印:
#include <iostream>
#include <string>
#include "unicode/unistr.h"
int main()
{
std::string s="Hello☺";
// at this point s contains a line of text
// which may be ANSI or UTF-8 encoded
// convert std::string to ICU's UnicodeString
icu::UnicodeString ucs = icu::UnicodeString::fromUTF8(icu::StringPiece(s.c_str()));
// convert UnicodeString to std::wstring
std::wstring ws;
for (int i = 0; i < ucs.length(); ++i)
ws += static_cast<wchar_t>(ucs[i]);
std::wcout << ws << std::endl;
}
预期输出:
Hello☺
实际输出:
Hello?
请指出我在做什么错。还建议任何替代/更简单的方法
谢谢
更新:工作代码如下。特别感谢@ <>的解决方案。我希望我可以给他的解决方案多一滴滴!!! :)
#include <iostream>
#include <string>
#include <locale>
#include "unicode/unistr.h"
void f(const std::string & s)
{
std::wcout << "Inside called function" << std::endl;
constexpr char locale_name[] = "";
setlocale( LC_ALL, locale_name );
std::locale::global(std::locale(locale_name));
std::ios_base::sync_with_stdio(false);
std::wcin.imbue(std::locale());
std::wcout.imbue(std::locale());
// at this point s contains a line of text which may be ANSI or UTF-8 encoded
// convert std::string to ICU's UnicodeString
icu::UnicodeString ucs = icu::UnicodeString::fromUTF8(icu::StringPiece(s.c_str()));
// convert UnicodeString to std::wstring
std::wstring ws;
for (int i = 0; i < ucs.length(); ++i)
{
ws += static_cast<wchar_t>(ucs[i]);
std::wcout << static_cast<wchar_t>(ucs[i]) << std::endl;
}
std::wcout << ws << std::endl;
}
int main()
{
constexpr char locale_name[] = "";
setlocale( LC_ALL, locale_name );
std::locale::global(std::locale(locale_name));
std::ios_base::sync_with_stdio(false);
std::wcin.imbue(std::locale());
std::wcout.imbue(std::locale());
std::wcout << "Inside main function" << std::endl;
std::string s=u8"hello☺";
// at this point s contains a line of text which may be ANSI or UTF-8 encoded
// convert std::string to ICU's UnicodeString
icu::UnicodeString ucs = icu::UnicodeString::fromUTF8(icu::StringPiece(s.c_str()));
// convert UnicodeString to std::wstring
std::wstring ws;
for (int i = 0; i < ucs.length(); ++i)
{
ws += static_cast<wchar_t>(ucs[i]);
std::wcout << static_cast<wchar_t>(ucs[i]) << std::endl;
}
std::wcout << ws << std::endl;
std::wcout << "--------------------------------" << std::endl;
f(s);
return 0;
}
有很多绊脚石可以解决这个问题:
0xE2 0x98 0xBA
。u8
装饰器将字符串标记为包含UTF-8数据:u8"Hello☺"
icu::UnicodeString
的文档说明它将Unicode存储为UTF-16。在这种情况下,您很幸运,因为U + 263A可以容纳一个UTF-16字符。其他表情符号可能不会!您应该将其转换为UTF-32,或者非常小心并使用GetChar32At
函数。wcout
使用的编码应配置为imbue
以匹配您的环境期望的编码。请参阅this question的答案。