如何检测中文文本是简体字还是繁体字[重复]

问题描述 投票:0回答:1

Java中检测中文Unicode字符串是否包含中文简体字符或繁体字符的可靠方法是什么?假设默认情况下,简化和传统范围中常见的字符将被视为简化字符。

理想情况下,检查特定 Unicode 字符范围的正则表达式匹配。这些范围是否被记录和定义,这种方法可靠吗?

java cjk chinese-locale chinese-simplified chinese-traditional
1个回答
0
投票
public class ChineseCharacterDetector {
    public static boolean containsSimplifiedChinese(String input) {
        for (char c : input.toCharArray()) {
            if (isSimplifiedChinese(c)) {
                return true;
            }
        }
        return false;
    }

    public static boolean containsTraditionalChinese(String input) {
        for (char c : input.toCharArray()) {
            if (isTraditionalChinese(c)) {
                return true;
            }
        }
        return false;
    }

    private static boolean isSimplifiedChinese(char c) {
        // Common simplified Chinese character range
        return (c >= '\u4E00' && c <= '\u9FFF');
    }

    private static boolean isTraditionalChinese(char c) {
        // Common traditional Chinese character ranges
        return (c >= '\u4E00' && c <= '\u9FFF') || // Common characters
               (c >= '\u3400' && c <= '\u4DBF') || // Extended-A
               (c >= '\u20000' && c <= '\u2A6DF'); // Extended-B
    }

    public static void main(String[] args) {
        String input = "你好,世界!Hello, 世界!";
        
        if (containsSimplifiedChinese(input)) {
            System.out.println("Contains Simplified Chinese characters");
        } else if (containsTraditionalChinese(input)) {
            System.out.println("Contains Traditional Chinese characters");
        } else {
            System.out.println("Contains neither Simplified nor Traditional Chinese characters");
        }
    }
}

isSimplifiedChinese 函数考虑常见简体中文范围的字符,而 isTraditionalChinese 函数考虑典型繁体中文范围以及某些扩展范围的字符。函数 containsSimplifiedChinese 和 containsTraditionalChinese 迭代输入文本,查找指定范围内的字符。

© www.soinside.com 2019 - 2024. All rights reserved.