将十六进制字节数组的字符串表示形式转换为Java中非ASCII字符的字符串

Question

我在客户端的请求有效负载中发送了一个字符串，为：

"[0xc3][0xa1][0xc3][0xa9][0xc3][0xad][0xc3][0xb3][0xc3][0xba][0xc3][0x81][0xc3][0x89][0xc3][0x8d][0xc3][0x93][0xc3][0x9a]Departms"

我想获得一个字符串，它是“áéíóúÁÉÉÓÓÚDepartms”。如何用Java做到这一点？

问题是，我无法控制客户端对该字符串进行编码的方式。似乎客户端只是以这种格式编码非ascii字符并按原样发送ascii字符（请参阅最后的“部门”）。

Answer 1

编辑-这个答案有两个问题-我建议不要使用它。

问题中的前两个十六进制值为0xc3和0xa1。这些代表字母UTF-8 encoding的á。

Integer.byteValue()方法可用于将int转换为字节。将其与Integer.decode()（其为accepts hex strings（例如“ 0xc3”））结合使用，我们可以执行以下操作：

Integer.decode("0xc3").byteValue();

然后，使用上述方法，并假设整个字符串为UTF-8：

private static String myDecoder(String str) {
    String[] myBytesStr = str.replace("[", "").split("]");
    int len = myBytesStr.length -1; // last element is not hex, just chars
    byte[] myBytes = new byte[len];
    for (int i = 0; i < len; i++) {
        myBytes[i] = Integer.decode(myBytesStr[i]).byteValue();
    }
    return new String(myBytes, StandardCharsets.UTF_8) + myBytesStr[len];
}

对于问题中提供的输入，将返回：

áéíóúÁÉÍÓÚDepartms

Answer 2

方括号内的内容似乎是用UTF-8编码的字符，但以一种奇怪的方式转换为十六进制字符串。您可以做的是找到每个看起来像[0xc3]的实例，并将其转换为相应的字节，然后从这些字节中创建一个新字符串。

不幸的是，没有好的工具可以处理字节数组。这是一个快速而肮脏的解决方案，它使用regex查找并用latin-1中的相应字符替换这些十六进制代码，然后通过重新解释字节来解决此问题。

String bracketDecode(String str) {
    Pattern p = Pattern.compile("\\[(0x[0-9a-f]{2})\\]");
    Matcher m = p.matcher(str);
    StringBuilder sb = new StringBuilder();
    while (m.find()) {
        String group = m.group(1);
        Integer decode = Integer.decode(group);
        // assume latin-1 encoding
        m.appendReplacement(sb, Character.toString(decode));
    }
    m.appendTail(sb);
    // oh no, latin1 is not correct! re-interpret bytes in utf-8
    byte[] bytes = sb.toString().getBytes(StandardCharsets.ISO_8859_1);
    return new String(bytes, StandardCharsets.UTF_8);
}

将十六进制字节数组的字符串表示形式转换为Java中非ASCII字符的字符串

问题描述投票：0回答：2

2个回答

最新问题

将十六进制字节数组的字符串表示形式转换为Java中非ASCII字符的字符串

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2