在Java中，一个char是两个字节？

Question

在java中，一个char是两个字节。

但是，为什么下面的代码返回2，而不是4？

public static void main(String[] args) {

        byte[] b = new String(new char[] {'H', 'I' }).getBytes();
        System.out.println(b.length);

}

Answer 1

getBytes() 对默认的Unicode字符进行编码 Charset 的JVM，通常是 ISO-8859-1 或 UTF-8，两者都使用一个字节来存储这些字符。

这段代码应该有助于说明发生了什么。

public static void main(String[] args) throws Exception {
    test("ISO-8859-1", new char[] { 'H', 'I' });
    test("UTF-8"     , new char[] { 'H', 'I' });
    test("UTF-16LE"  , new char[] { 'H', 'I' });
    test("UTF-32LE"  , new char[] { 'H', 'I' });
    test("ISO-8859-1", new char[] { '⅓', '⅔' });
    test("UTF-8"     , new char[] { '⅓', '⅔' });
    test("UTF-16LE"  , new char[] { '⅓', '⅔' });
    test("UTF-32LE"  , new char[] { '⅓', '⅔' });
    test("UTF-8"     , "😀👍");
    test("UTF-16LE"  , "😀👍");
    test("UTF-32LE"  , "😀👍");
}
static void test(String charsetName, char[] chars) throws Exception {
    test(charsetName, new String(chars));
}
static void test(String charsetName, String input) throws Exception {
    byte[] bytes = input.getBytes(charsetName);
    System.out.printf("%-12s %-6s", charsetName, new String(bytes, charsetName));
    for (byte b : bytes)
        System.out.printf(" %02x", b);
    System.out.println();
}

输出

ISO-8859-1   HI     48 49
UTF-8        HI     48 49
UTF-16LE     HI     48 00 49 00
UTF-32LE     HI     48 00 00 00 49 00 00 00
ISO-8859-1   ??     3f 3f
UTF-8        ⅓⅔     e2 85 93 e2 85 94
UTF-16LE     ⅓⅔     53 21 54 21
UTF-32LE     ⅓⅔     53 21 00 00 54 21 00 00
UTF-8        😀👍   f0 9f 98 80 f0 9f 91 8d
UTF-16LE     😀👍   3d d8 00 de 3d d8 4d dc
UTF-32LE     😀👍   00 f6 01 00 4d f4 01 00

在Java中，一个char是两个字节？

问题描述投票：0回答：1

1个回答

最新问题

在Java中，一个char是两个字节？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1