有
<BATCHNAME> Any</BATCHNAME>
我的xml请求中的标记有''字符值。没有这些字符我的代码完美无缺,但在某些情况下我有这些字符。它给了我以下错误
[致命错误]:144:28:字符引用“&#org.xml.sax.SAXParseException; lineNumber:144; columnNumber:28;字符引用”&#com.sun.org.apache.xerces.internal.parsers。 DOMParser.parse(DOMParser.java:257)位于com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)的javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java: 121)在db(AllCommonTasks.java:277)at ...
我需要这些字符进行验证
我正在尝试这段代码=>
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
URLConnection urlConnection = new URL(urlString).openConnection();
urlConnection.addRequestProperty("Accept", "application/xml");
urlConnection.addRequestProperty("User-Agent", "Mozilla/5.0 ( compatible ) ");
Document doc = db.parse(urlConnection.getInputStream());
doc.getDocumentElement().normalize();
str = convertDocumentToString(doc);
}catch(Exception e){
System.err.println("In exception 1");
e.printStackTrace();
}
我怎么解决这个问题?
查看Wikipedia page for XML and HTML entity references,遵循&#nnnn;
模式的实体引用是十进制形式的Unicode代码点,这意味着
将等同于Unicode U+0004:END OF TRANSMISSION
这是一个非打印字符。
所以我认为解析器在这种情况下是正确的失败。
事实上,如果你看看com.sun.org.apache.xerces.internal.impl.XMLScanner#scanCharReferenceValue
的来源,你可以看到它在这里引用com.sun.org.apache.xerces.internal.util.XMLChar#isValid
:
/**
* Returns true if the specified character is valid. This method
* also checks the surrogate character range from 0x10000 to 0x10FFFF.
* <p>
* If the program chooses to apply the mask directly to the
* <code>CHARS</code> array, then they are responsible for checking
* the surrogate character range.
*
* @param c The character to check.
*/
public static boolean isValid(int c) {
return (c < 0x10000 && (CHARS[c] & MASK_VALID) != 0) ||
(0x10000 <= c && c <= 0x10FFFF);
} // isValid(int):boolean