我必须向XML节点添加一个属性超过10k的记录,这是更快地转换XML文档的最佳方法。
我尝试过StAX解析器,添加属性几乎需要4分钟,使用SAX解析器需要5分钟。
有没有其他的lib可以做得更好或另一种方式来做到这一点请提出你的建议。
示例代码:(使用STAX解析器)
try {
XMLStreamReader r = factory.createXMLStreamReader(new FileInputStream(inputfile));
/* Start Writing document */
XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newInstance();
XMLEventWriter xmlEventWriter = xmlOutputFactory.createXMLEventWriter(new FileOutputStream(outputfile),
"UTF-8");
/* End Writing document */
int event = r.getEventType();
long startTime = System.currentTimeMillis();
System.out.println("Started reading node from xml document....." + TimeUnit.MILLISECONDS.toSeconds(startTime));
int node1Cnt = 0, node2Cnt = 0, node3Cnt = 0, node4Cnt = 0;
while (true) {
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
switch (event) {
case XMLStreamConstants.START_DOCUMENT:
// System.out.println("Start Document.");
StartDocument startDocument = eventFactory.createStartDocument();
xmlEventWriter.add(startDocument);
break;
case XMLStreamConstants.START_ELEMENT:
// Create Start node
if (r.getLocalName().equalsIgnoreCase(node1)) {
node1Cnt++;
node2Cnt = 0;
Attribute attribute = eventFactory.createAttribute("id", "5522" + node1Cnt);
List attributeList = Arrays.asList(attribute);
List nsList = Arrays.asList();
StartElement sElement = eventFactory.createStartElement("", "", r.getLocalName(),attributeList.iterator(), nsList.iterator());
xmlEventWriter.add(sElement);
} else if (r.getLocalName().equalsIgnoreCase(node2Cnt)) {
node2Cnt++;
Attribute attribute = eventFactory.createAttribute("id", "5522" + node1Cnt + node2Cnt);
List attributeList = Arrays.asList(attribute);
List nsList = Arrays.asList();
StartElement sElement = eventFactory.createStartElement("", "", r.getLocalName(),
attributeList.iterator(), nsList.iterator());
xmlEventWriter.add(sElement);
} else {
StartElement sElement = eventFactory.createStartElement("", "", r.getLocalName());
xmlEventWriter.add(sElement);
}
StartElement sElement = eventFactory.createStartElement("", "", r.getLocalName());
xmlEventWriter.add(sElement);
break;
case XMLStreamConstants.CHARACTERS:
if (r.isWhiteSpace())
break; // System.out.println("Text: " + r.getText());
Characters characters = eventFactory.createCharacters(r.getText());
xmlEventWriter.add(characters);
break;
case XMLStreamConstants.END_ELEMENT:
// System.out.println("End Element:" + r.getName());
EndElement endElement = eventFactory.createEndElement("", "", r.getLocalName());
xmlEventWriter.add(endElement);
break;
case XMLStreamConstants.END_DOCUMENT:
xmlEventWriter.add(eventFactory.createEndDocument());
break;
}
if (!r.hasNext())
break;
event = r.next();
}
r.close();
System.out.println("Ended reading node from xml document....."
+ (TimeUnit.MILLISECONDS.toSeconds(System.currentTimeMillis())
- TimeUnit.MILLISECONDS.toSeconds(startTime)));
}catch(XMLStreamException ex){
ex.printStackTrace();
}catch(IOException ex){
// TODO Auto-generated catch block
ex.printStackTrace();
}finally{
System.out.println("finish!!");
}
我怀疑XMLEventFactory.newInstance()非常昂贵,因为它涉及搜索类路径。绝对不需要在事件循环中创建新工厂:在开始时创建一个工厂并重复使用它。
除此之外,我怀疑使用XMLStreamWriter可能比使用XMLEventWriter更容易,更快。
(但这些性能推测是猜测,因为在调整性能时,您需要进行测量以评估代码更改的影响。)
我个人会在XSLT中写这个。你没有给出足够的转换细节,但在XSLT 3.0中它是这样的:
<xsl:transform....>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="node1">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:variable name="node1id" as="xs:string">
<xsl:text>5522</xsl:text>
<xsl:number/>
</xsl:variable>
<xsl:attribute name="id" select="$node1id"/>
<xsl:apply-templates>
<xsl:with-param name="node1id" select="$node1id" tunnel="yes"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
<xsl:template match="node2">
<xsl:param name="node1id" tunnel="yes"/>
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:attribute name="id">
<xsl:value-of select="$node1id"/>
<xsl:number/>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:transform>