Java XML解析：文档（DeferredDocumentImpl）与文档（XMLDocument）在不同环境下的差异

Question

我在 Java 8 中遇到 XML 解析问题，与较低环境相比，相同的代码在生产中的行为有所不同。

这是使用部分 XPath 打印 XML nodeValue 的 Java 代码片段：

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.InputSource;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.*;
import java.io.ByteArrayInputStream;
import java.io.InputStreamReader;
import java.nio.charset.StandardCharsets;

public class XMLParserDemo {
public static void main(String[] args) throws Exception {
final String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n" +
"<securityDataUpdated xmlns:ns0=\"http://xmlns.paddy.com/canonical/1\" xmlns=\"http://xmlns.paddy.com/canonical/2.1\" version=\"362\">\n" +
" <ns0:header>\n" +
" <ns0:correlationId>c3004188-ae3d-11ee-a506-0242ac120002</ns0:correlationId>\n" +
" <ns0:sourceSystem>CASH</ns0:sourceSystem>\n" +
" <ns0:initialCreationDate>2017-05-04</ns0:initialCreationDate>\n" +
" <ns0:creationTimestamp>2023-10-20T03:59:39.256-04:00</ns0:creationTimestamp>\n" +
" </ns0:header>\n" +
" <ns0:comment>hhga1002M</ns0:comment>\n" +
" <securityTypeIndicator>CASH FLOW</securityTypeIndicator>\n" +
" <publishingAssetClass>CF</publishingAssetClass>\n" +
" <cashInformation active=\"true\">\n" +
" <productType system=\"BRDR\">FORWARD</productType>\n" +
" <name>COP Onsh Fwd OutRt 10Y</name>\n" +
" <identifier type=\"ICEBERG-UNIQUE\">IX35571185-0</identifier>\n" +
" <identifier type=\"TICKER\">CLO+10Y</identifier>\n" +
" <identifier type=\"ICEBERG-ID-GLOBAL\">BBG00GMFSG55</identifier>\n" +
" <identifier type=\"PADDY-ID\">92157699</identifier>\n" +
" <identifier type=\"BRDR-UNIQUE-ID\">92157699-NA-NA</identifier>\n" +
" <issue active=\"true\">\n" +
" <issueActivityStatus>ACTIVE</issueActivityStatus>\n" +
" </issue>\n" +
" <classification>\n" +
" <marketSector>Curncy</marketSector>\n" +
" </classification>\n" +
" <adpDetails>\n" +
" <adpSrcFlag>false</adpSrcFlag>\n" +
" <adpPrivateRangeFlag>false</adpPrivateRangeFlag>\n" +
" </adpDetails>\n" +
" <BRDRVersionDetails>\n" +
" <BRDRVersionNumber>362</BRDRVersionNumber>\n" +
" <BRDRVersionTimestamp>2021-12-10T17:47:32.103-05:00</BRDRVersionTimestamp>\n" +
" <BRDRPublishedTimestamp>2023-10-20T03:59:39.263-04:00</BRDRPublishedTimestamp>\n" +
" </BRDRVersionDetails>\n" +
" <productTaxonomy>\n" +
" <lrmProductCode>1234</lrmProductCode>\n" +
" <lrmProductDescription>Obsolete</lrmProductDescription>\n" +
" </productTaxonomy>\n" +
" <isOTISSecurity>true</isOTISSecurity>\n" +
" <isTOMSNSecurity>false</isTOMSNSecurity>\n" +
" <isPolypathSecurity>false</isPolypathSecurity>\n" +
" <ISOCountryCode>IN</ISOCountryCode>\n" +
" </cashInformation>\n" +
"</securityDataUpdated>";

Document doc = getDocFromString(xml);
System.out.println(getValueFromXml("/*//identifier[@type='BRDR-UNIQUE-ID']/text()", doc));
}

public static Document getDocFromString(String xml) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new InputStreamReader(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8))));
return builder.parse(is);
}

public static String getValueFromXml(String xPathExpr, Document doc) throws XPathExpressionException {
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();

XPathExpression expr = xpath.compile(xPathExpr);
Node node = (Node) expr.evaluate(doc, XPathConstants.NODE);
return node != null ? node.getNodeValue() : null;
}
}

在较低环境中，此代码成功打印任何有效

XPath

的值。但是，在生产中，对于相同的

getValueFromXml

表达式，

XPath

方法会返回 null。

在使用 IntelliJ 在生产中进行远程调试时，我注意到

getDocFromString

在生产中返回不同的 Document 对象 (

XMLDocument

) 结构，这与较低环境中的

DeferredDocumentImpl

不同。在底层环境中，Document对象结构由

fNodeCount

、

fNodeType

等组成，而在生产中，Document对象包含“

err

”、

validateErrNodeStr = "oracle.xml.schemavalidator.nodeerr"

、

validateErrParentStr = "oracle.xml.schemavalidator.parenterr"

等字段。

开发（底层环境）

生产

我很困惑为什么文档结构在生产中不同，因为使用了相同的代码、XML 和 JDK (Java 8) 版本。

Answer 1

调用

DocumentBuilderFactory.newInstance()

在类路径中搜索 DOM 实现。同样，

XPathFactory.newInstance()

在类路径中搜索 XPath 实现。如果类路径上有不同的 DOM 和/或 XPath 库，那么您将得到不同的结果。原则上它们应该是兼容的，但我无法立即明白为什么情况并非如此。

这可能是因为您尚未在

setNamespaceAware(true)

上调用

DocumentBuilderFactory

，并且 XPath 在非命名空间感知 DOM 上的行为未明确定义。

我实际上希望您的 XPath 表达式不选择任何内容，因为

identifier

元素位于命名空间中，但您正在无命名空间中搜索

identitifer

元素。

Java XML解析：文档（DeferredDocumentImpl）与文档（XMLDocument）在不同环境下的差异

问题描述投票：0回答：1

1个回答

最新问题

Java XML解析：文档（DeferredDocumentImpl）与文档（XMLDocument）在不同环境下的差异

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1