如何使用符合 VeraPDF 标准的 Apache PDFBox 创建最简单的 PDFA 2b?

问题描述 投票:0回答:1

我正在使用 Apache PDFBox 创建一个非常简单的 pdf,其中只有一行文本,并且符合 PDFA 2b,我想使用 VeraPDF 检查此 pdf 的一致性。 Vera 告诉我,该 pdf 不合规,并向我展示了两个失败的断言:

  • TestAssertion [ruleId=RuleId [specification=ISO 19005-2:2011, clause=6.6.2.1, testNumber=1], status=failed, message=The Catalog dictionary of a conforming file shall contain the Metadata key whose value is a metadata stream as defined in ISO 32000-1:2008, 14.3.2., location=Location [level=CosDocument, context=root/document[0]], locationContext=null, errorMessage=null]
  • TestAssertion [ruleId=RuleId [specification=ISO 19005-2:2011, clause=6.2.4.3, testNumber=4], status=failed, message=DeviceGray shall only be used if a device independent DefaultGray colour space has been set when the DeviceGray colour space is used, or if a PDF/A OutputIntent is present., location=Location [level=CosDocument, context=root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[3]/fillCS[0]], locationContext=null, errorMessage=null]

我的代码看起来像这样:

try (ByteArrayOutputStream baos = new ByteArrayOutputStream(); PDDocument document = new PDDocument(); COSStream cosStream = new COSStream()) {
    PDPage page = new PDPage();
    document.addPage(page);

    PDDocumentInformation documentInformation = new PDDocumentInformation();
    documentInformation.setTitle("Name");
    documentInformation.setCreator("Creator");
    documentInformation.setSubject("Subject");
    document.setDocumentInformation(documentInformation);

    try (ByteArrayOutputStream xmpOutputStream = new ByteArrayOutputStream(); OutputStream cosXMPStream = cosStream.createOutputStream()) {
        XMPMetadata xmp = XMPMetadata.createXMPMetadata();
        PDFAIdentificationSchema pdfaSchema = xmp.createAndAddPFAIdentificationSchema();
        pdfaSchema.setPart(2);
        pdfaSchema.setConformance("B");
        DublinCoreSchema dublinCoreSchema = xmp.createAndAddDublinCoreSchema();
        dublinCoreSchema.setTitle("Name");
        dublinCoreSchema.addCreator("Creator");
        dublinCoreSchema.setDescription("Subject");
        XMPBasicSchema basicSchema = xmp.createAndAddXMPBasicSchema();
        Calendar creationDate = Calendar.getInstance();
        basicSchema.setCreateDate(creationDate);
        basicSchema.setModifyDate(creationDate);
        basicSchema.setMetadataDate(creationDate);
        basicSchema.setCreatorTool("Creator Tool");
        new XmpSerializer().serialize(xmp, xmpOutputStream, true);
        cosXMPStream.write(xmpOutputStream.toByteArray());
        document.getDocumentCatalog().setMetadata(new PDMetadata(cosStream));
    }

    PDViewerPreferences prefs = new PDViewerPreferences(page.getCOSObject());
    prefs.setDisplayDocTitle(true);
    document.getDocumentCatalog().setViewerPreferences(prefs);

    File fontFile = new File("C:\\Windows\\Fonts\\arial.ttf");
    PDType0Font font = PDType0Font.load(document, fontFile);

    PDPageContentStream contentStream = new PDPageContentStream(document, page);
    contentStream.beginText();
    contentStream.setFont(font, 12);
    contentStream.newLineAtOffset(100, 700);
    contentStream.showText("Hello PDF/A-2b World!");
    contentStream.endText();
    contentStream.close();

    document.save(baos);
    try (PDFAParser parser = Foundries.defaultInstance().createParser(new ByteArrayInputStream(baos.toByteArray()), PDFAFlavour.PDFA_2_B)) {
        PDFAValidator validator = Foundries.defaultInstance().createValidator(PDFAFlavour.PDFA_2_B, false);
        ValidationResult result = validator.validate(parser);
        System.out.println(result.isCompliant());
    }
}

当我使用 debugger-app-2.0.31.jar 检查生成的 PDF 时,我可以找到元数据。当我将元数据与 VeraPDF 回归测试中的 pdf 文件(例如this one)进行比较时,与我相关的唯一区别在于 begin="" 标签。 vera 测试文件中为空

<?xpacket begin=''
,并且似乎在 pdfbox 创建的文件中包含 BOM 起始序列
<?xpacket begin=""

有人能告诉我,这是 VeraPDF 或 PDFBox 中的错误吗?这个问题有解决办法吗? 有人可以向我解释第二个错误并提供解决方案吗?

pdfbox pdfa
1个回答
0
投票

源代码中的 CreatePDFA 示例的元数据部分略有不同,尽管您的看起来不错,并且我能够使用 VeraPDF 验证它:

XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);

PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(baos.toByteArray());
doc.getDocumentCatalog().setMetadata(metadata);

第二个问题是缺少输出意图。添加此代码:

// sRGB output intent
InputStream colorProfile = CreatePDFA.class.getResourceAsStream(
        "/org/apache/pdfbox/resources/pdfa/sRGB.icc");
PDOutputIntent intent = new PDOutputIntent(doc, colorProfile);
intent.setInfo("sRGB IEC61966-2.1");
intent.setOutputCondition("sRGB IEC61966-2.1");
intent.setOutputConditionIdentifier("sRGB IEC61966-2.1");
intent.setRegistryName("http://www.color.org");
doc.getDocumentCatalog().addOutputIntent(intent);
© www.soinside.com 2019 - 2024. All rights reserved.