Java XML序列化错误:检测到无效的UTF-16代理

问题描述 投票:1回答:2

我有一个org.w3c.dom.Document,并希望使用此功能序列化它,但我得到一个SAXException。我怎么能解决这个问题?

public static String serializeXmlDocument(Document document) throws Exception
{
    // set up a transformer
    TransformerFactory transformerFactory = TransformerFactory.newInstance();
    Transformer trans = transformerFactory.newTransformer();
    trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    trans.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    trans.setOutputProperty(OutputKeys.INDENT, "yes");
    DOMSource source = new DOMSource(document);

    // create string from xml tree
    StringWriter stringWriter = new StringWriter();
    StreamResult stringResult = new StreamResult(stringWriter);
    trans.transform(source, stringResult);

    return stringWriter.toString();
}

这会导致以下错误:

2014-07-20 03:03:36,451 ERROR  [XXX] XXX main job error:  
javax.xml.transform.TransformerException: org.xml.sax.SAXException: E/A-Fehler
java.io.IOException: Ungültige UTF-16-Ersetzung festgestellt: d835 20 ?
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:758)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:359)
    at mypackage.handler.XmlHandler.serializeXmlDocument(XmlHandler.java:226)
    at mypackage.subpackage.buildSolrXml(MyJob.java:213)
    at mypackage.subpackage.doJob(MyJob.java:113)
    at mypackage.MyWorkstation.main(MyWorkstation.java:27)
Caused by: org.xml.sax.SAXException: E/A-Fehler
java.io.IOException: Ungültige UTF-16-Ersetzung festgestellt: d835 20 ?
    at com.sun.org.apache.xml.internal.serializer.ToStream.cdata(ToStream.java:1290)
    at com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1395)
    at com.sun.org.apache.xml.internal.serializer.ToUnknownStream.characters(ToUnknownStream.java:814)
    at com.sun.org.apache.xml.internal.serializer.ToUnknownStream.characters(ToUnknownStream.java:348)
    at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:122)
    at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230)
    at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230)
    at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230)
    at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:136)
    at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:98)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:702)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:746)
    ... 5 more
Caused by: java.io.IOException: Ungültige UTF-16-Ersetzung festgestellt: d835 20 ?
    at com.sun.org.apache.xml.internal.serializer.ToStream.writeUTF16Surrogate(ToStream.java:973)
    at com.sun.org.apache.xml.internal.serializer.ToStream.writeNormalizedChars(ToStream.java:1110)
    at com.sun.org.apache.xml.internal.serializer.ToStream.cdata(ToStream.java:1267)
    ... 16 more
java xml serialization xml-serialization
2个回答
0
投票

该文档包含无效的Unicode字符,如

http://www.fileformat.info/info/unicode/char/d835/index.htm

我用removing invalid XML characters from a string in java的解决方案修复了它

// remove illegal unicode characters
    String xml10pattern = "[^"
            + "\u0009\r\n"
            + "\u0020-\uD7FF"
            + "\uE000-\uFFFD"
            + "\ud800\udc00-\udbff\udfff"
            + "]";

    stringValue = stringValue.replaceAll(xml10pattern, " ");

0
投票

这并不总是由无效的UTF-16字符引起的。如果多字节UTF-8/16/32字符跨越Stream中任何位置的1024字节边界,则Xalan XSLTC处理器会将字符分成两部分,这会导致生成两个不正确的字符,并且(在大多数情况下)将会产生上述错误。

这是由于Xalan bug(1024字节缓冲区),将在OpenJDK 12中修复。

触发此错误的最简单文件是:

<?xml version="1.0" ?><x>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx𝜃</x>
最新问题
© www.soinside.com 2019 - 2024. All rights reserved.