[我试图弄清楚为什么我本来应该将XML转换为XML的简单XSLT转换似乎无法实现这一点。
转换只复制了所有内容:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" encoding="utf-8" />
<xsl:template match="*|@*">
<xsl:copy>
<xsl:apply-templates select="*|@*|text()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
带有输入的XML文件,如下所示:
<?xml version="1.0" encoding="utf-8"?>
<foo xmlns="uri:foo">
<name>丕𠀆𠀅𠀍𠁀</name>
</foo>
以下为结果:
<?xml version="1.0" encoding="utf-8"?>
<foo xmlns="uri:foo">
<name>丕��������</name>
</foo>
我使用的所有工具都依赖于(Java)Apache Xalan 2.7.1 XSLT处理器,包括带有XSL Developer Tools插件的Eclipse(Mars),我在其中创建了此示例。
后一个插件声称输入XML的格式正确,但是输出XML的格式不正确(字符参考&#55360是无效的XML字符)。
为什么我的XSLT处理器生成无效的XML,如何防止它这样做呢?
实际的代码与此类似(您在类路径中需要Xalan:]
import java.io.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
public class XSLTTest {
private final TransformerFactory xalanTransFact;
public XSLTTest() {
xalanTransFact = new org.apache.xalan.processor.TransformerFactoryImpl();
}
public Templates createCustomTransformation(
File transformation
) throws TransformerException, IOException {
InputStreamReader readerTransformation = null;
try {
readerTransformation = new InputStreamReader(
new FileInputStream(transformation), StandardCharsets.UTF_8);
Templates transformer = xalanTransFact.newTemplates(
new StreamSource(readerTransformation)
);
return transformer;
} catch (TransformerException | IOException ex) {
throw ex;
} finally {
try {
if (readerTransformation != null) {
readerTransformation.close();
}
} catch (IOException ex) {}
}
}
public File applyCustomTransformation(
Transformer transformer, Reader transformeeReader, Path out,
boolean indent
) throws TransformerException, IOException {
Writer writer = null;
try {
File file = out.toFile();
writer = new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8);
if (indent) {
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(
"{http://xml.apache.org/xslt}indent-amount",
String.valueOf(2));
}
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
transformer.transform(
new StreamSource(transformeeReader),
new StreamResult(writer));
return file;
} catch (TransformerException | IOException ex) {
throw ex;
} finally {
try {
if (writer != null) {
writer.close();
}
} catch (IOException ex) {}
}
}
private void saveToFile(File selectedFile, String content)
throws FileNotFoundException, IOException {
Writer writer = null;
try {
writer = new OutputStreamWriter(
new FileOutputStream(selectedFile), StandardCharsets.UTF_8);
writer.write(content);
writer.flush();
}
catch (FileNotFoundException ex) {
throw ex;
} catch (IOException ex) {
throw ex;
} finally {
if (writer != null) {
try {
writer.close();
} catch (IOException ex) {
}
}
}
}
public static void main(String[] args) throws IOException, TransformerException {
String xslText = "" +
"<?xml version=\"1.0\" encoding=\"utf-8\"?>\n" +
"<xsl:stylesheet xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\"\n" +
" version=\"1.0\">\n" +
" <xsl:output method=\"xml\" encoding=\"utf-8\" />\n" +
" <xsl:template match=\"*|@*\">\n" +
" <xsl:copy>\n" +
" <xsl:apply-templates select=\"*|@*|text()\" />\n" +
" </xsl:copy>\n" +
" </xsl:template>\n" +
"</xsl:stylesheet>";
String xmlToParse = "" +
"<?xml version=\"1.0\" encoding=\"utf-8\"?>\n" +
"<foo xmlns=\"uri:foo\">\n" +
" <name>丕𠀆𠀅𠀍𠁀</name>\n" +
"</foo>";
XSLTTest test = new XSLTTest();
Path xsl = Files.createTempFile("test", ".xsl");
test.saveToFile(xsl.toFile(), xslText);
Templates templates = test.createCustomTransformation(xsl.toFile());
Transformer transformer = templates.newTransformer();
Path xml = Files.createTempFile("test-out", ".xml");
StringReader reader = new StringReader(xmlToParse);
test.applyCustomTransformation(transformer, reader, xml, true);
System.out.println("Result is at: " + xml.toString());
}
}
由于某些原因,我无法切换到另一个XSLT处理器。
正如@VGR在评论中所写,这是错误https://issues.apache.org/jira/browse/XALANJ-2419的体现。
对他们的JIRA的评论提出了一种解决方法-使用UTF-16作为转换的输出编码,而不是UTF-8,因为该错误只会影响后者。
因此,在我的示例中,为行
transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
需要替换为
// workaround for https://issues.apache.org/jira/browse/XALANJ-2419
transformer.setOutputProperty(OutputKeys.ENCODING, "utf-16");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
writer.write("<?xml version=\"1.0\" encoding=\"utf-8\"?>\n");
而其他所有内容保持不变。实际文件仍以UTF-8格式编写,但转换将在内部以UTF-16格式处理。