将非ASCII字符转换为英语对应的C ++

问题描述 投票:0回答:1

我需要比较从各个位置收集的数据,其中一些包含非ASCII字符,特别是带有重音符号的英文字母。例如“FrédérikGauthier。:-61。:-87。:-61。:-87”。当我查看字符的int值时,我注意到这些字符始终是2个“字符”的组合,其值是-61,表示字母将加重音,在这种情况下,字母是-87。重读“ e”。我的目标是仅“删除”重音并使用英文字符。显然,我不能依赖于系统之间的这种行为,那么您如何处理这种情况? std :: string,可以毫无问题地处理字符,但是一旦我进入char级别,就可以解决问题。有指导吗?

#include <iostream>
#include <fstream>
#include <algorithm>

int main(int argc, char** argv){
    std::fstream fin;
    std::string line;
    std::string::iterator it;
    bool leave = false;
    fin.open(argv[1], std::ios::in);

    while(getline(fin, line)){
        std::for_each(line.begin(), line.end(), [](char &a){
            if(!isascii(a)) {
                if(int(a) == -68) a = 'u';
                else if(int(a) == -74) a = 'o';
                else if(int(a) == -83) a = 'i';
                else if(int(a) == -85) a = 'e';
                else if(int(a) == -87) a = 'e';
                else if(int(a) == -91) a = 'a';
                else if(int(a) == -92) a = 'a';
                else if(int(a) == -95) a = 'a';
                else if(int(a) == -120) a = 'n';
            }
        });
        it = line.begin();
        while(it != line.end()){
            it = std::find_if(line.begin(), line.end(), [](char &a){ return !isascii(a); });
            if(it != line.end()){
                line.erase(it);
                it = line.begin();
            }
        }
        std::cout << line << std::endl;
        std::for_each(line.begin(), line.end(), [&leave](char &a){
            if(!isascii(a)) {
                std::cout << a << " : " << int(a);
            }
        });
        if(leave){
            fin.close();
            return 1;
        }
    }
    fin.close();
    return 0;
}
c++ ascii non-ascii-characters
1个回答
0
投票

通常,这是一项棘手的任务,您可能需要根据自己的任务调整解决方案。要将字符串从任何编码形式转译为ASCII,最好依赖于库而不是尝试自己实现。这是使用iconv的示例:

#include <iconv.h>
#include <memory>
#include <type_traits>
#include <string>
#include <iostream>
#include <algorithm>
#include <string_view>
#include <cassert>
using namespace std;

string from_u8string(const u8string &s) {
  return string(s.begin(), s.end());
}

using iconv_handle = unique_ptr<remove_pointer<iconv_t>::type, decltype(&iconv_close)>;
iconv_handle make_converter(string_view to, string_view from) {
    auto raw_converter = iconv_open(to.data(), from.data());
    if (raw_converter != (iconv_t)-1) {
        return { raw_converter, iconv_close };
    } else {
        throw std::system_error(errno, std::system_category());
    }
}

string convert_to_ascii(string input, string_view encoding) {
    iconv_handle converter = make_converter("ASCII//TRANSLIT", encoding);

    char* input_data = input.data();
    size_t input_size = input.size();

    string output;
    output.resize(input_size * 2);
    char* converted = output.data();
    size_t converted_size = output.size();

    auto chars_converted = iconv(converter.get(), &input_data, &input_size, &converted, &converted_size);
    return output;
}

string convert_to_plain_ascii(string_view input, string_view encoding) {
    auto converted = convert_to_ascii(string{ input }, encoding);
    converted.erase(
        std::remove_if(converted.begin(), converted.end(), [](char c) { return !isalpha(c); }),
        converted.end()
    );
    return converted;
}

int main() {
    try {
        auto converted_utf8 = convert_to_plain_ascii(from_u8string(u8"Frédérik"), "UTF-8");
        assert(converted_utf8 == "Frederik");
        auto converted_1252 = convert_to_plain_ascii("Frédérik", "windows-1252");
        assert(converted_1252 == "Frederik");
    } catch (std::system_error& e) {
        cout << "Error " << e.code() << ": " << e.what() << endl;
    }
}
© www.soinside.com 2019 - 2024. All rights reserved.