我正在使用 Apache PDFBox 创建一个非常简单的 pdf,其中只有一行文本,并且符合 PDFA 2b,我想使用 VeraPDF 检查此 pdf 的一致性。 Vera 告诉我,该 pdf 不合规,并向我展示了两个失败的断言:
TestAssertion [ruleId=RuleId [specification=ISO 19005-2:2011, clause=6.6.2.1, testNumber=1], status=failed, message=The Catalog dictionary of a conforming file shall contain the Metadata key whose value is a metadata stream as defined in ISO 32000-1:2008, 14.3.2., location=Location [level=CosDocument, context=root/document[0]], locationContext=null, errorMessage=null]
TestAssertion [ruleId=RuleId [specification=ISO 19005-2:2011, clause=6.2.4.3, testNumber=4], status=failed, message=DeviceGray shall only be used if a device independent DefaultGray colour space has been set when the DeviceGray colour space is used, or if a PDF/A OutputIntent is present., location=Location [level=CosDocument, context=root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[3]/fillCS[0]], locationContext=null, errorMessage=null]
我的代码看起来像这样:
try (ByteArrayOutputStream baos = new ByteArrayOutputStream(); PDDocument document = new PDDocument(); COSStream cosStream = new COSStream()) {
PDPage page = new PDPage();
document.addPage(page);
PDDocumentInformation documentInformation = new PDDocumentInformation();
documentInformation.setTitle("Name");
documentInformation.setCreator("Creator");
documentInformation.setSubject("Subject");
document.setDocumentInformation(documentInformation);
try (ByteArrayOutputStream xmpOutputStream = new ByteArrayOutputStream(); OutputStream cosXMPStream = cosStream.createOutputStream()) {
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
PDFAIdentificationSchema pdfaSchema = xmp.createAndAddPFAIdentificationSchema();
pdfaSchema.setPart(2);
pdfaSchema.setConformance("B");
DublinCoreSchema dublinCoreSchema = xmp.createAndAddDublinCoreSchema();
dublinCoreSchema.setTitle("Name");
dublinCoreSchema.addCreator("Creator");
dublinCoreSchema.setDescription("Subject");
XMPBasicSchema basicSchema = xmp.createAndAddXMPBasicSchema();
Calendar creationDate = Calendar.getInstance();
basicSchema.setCreateDate(creationDate);
basicSchema.setModifyDate(creationDate);
basicSchema.setMetadataDate(creationDate);
basicSchema.setCreatorTool("Creator Tool");
new XmpSerializer().serialize(xmp, xmpOutputStream, true);
cosXMPStream.write(xmpOutputStream.toByteArray());
document.getDocumentCatalog().setMetadata(new PDMetadata(cosStream));
}
PDViewerPreferences prefs = new PDViewerPreferences(page.getCOSObject());
prefs.setDisplayDocTitle(true);
document.getDocumentCatalog().setViewerPreferences(prefs);
File fontFile = new File("C:\\Windows\\Fonts\\arial.ttf");
PDType0Font font = PDType0Font.load(document, fontFile);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
contentStream.beginText();
contentStream.setFont(font, 12);
contentStream.newLineAtOffset(100, 700);
contentStream.showText("Hello PDF/A-2b World!");
contentStream.endText();
contentStream.close();
document.save(baos);
try (PDFAParser parser = Foundries.defaultInstance().createParser(new ByteArrayInputStream(baos.toByteArray()), PDFAFlavour.PDFA_2_B)) {
PDFAValidator validator = Foundries.defaultInstance().createValidator(PDFAFlavour.PDFA_2_B, false);
ValidationResult result = validator.validate(parser);
System.out.println(result.isCompliant());
}
}
当我使用 debugger-app-2.0.31.jar 检查生成的 PDF 时,我可以找到元数据。当我将元数据与 VeraPDF 回归测试中的 pdf 文件(例如this one)进行比较时,与我相关的唯一区别在于 begin="" 标签。 vera 测试文件中为空
<?xpacket begin=''
,并且似乎在 pdfbox 创建的文件中包含 BOM 起始序列<?xpacket begin=""
。
有人能告诉我,这是 VeraPDF 或 PDFBox 中的错误吗?这个问题有解决办法吗? 有人可以向我解释第二个错误并提供解决方案吗?
源代码中的 CreatePDFA 示例的元数据部分略有不同,尽管您的看起来不错,并且我能够使用 VeraPDF 验证它:
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(baos.toByteArray());
doc.getDocumentCatalog().setMetadata(metadata);
第二个问题是缺少输出意图。添加此代码:
// sRGB output intent
InputStream colorProfile = CreatePDFA.class.getResourceAsStream(
"/org/apache/pdfbox/resources/pdfa/sRGB.icc");
PDOutputIntent intent = new PDOutputIntent(doc, colorProfile);
intent.setInfo("sRGB IEC61966-2.1");
intent.setOutputCondition("sRGB IEC61966-2.1");
intent.setOutputConditionIdentifier("sRGB IEC61966-2.1");
intent.setRegistryName("http://www.color.org");
doc.getDocumentCatalog().addOutputIntent(intent);