pdfbox 3.0 - 将页面复制到另一个文档后如何修复丢失的字体?

问题描述 投票:0回答:2

我尝试将导入页面合并到pdf文档中并复制字体资源。使用 PDFBOX 2.0,代码工作得非常好,正如预期的那样,有一个包含所需嵌入字体的结果文档。

本质上我是在代码中执行这些步骤,而第一个文档是一个空页PDF/A,第二个文档包含roboto字体PDF/A。所有字体均已嵌入。

PDDocument targetDoc = Loader.loadPDF(targetDocBytes);
PDPage sourcePage = Loader.loadPDF(data).getPage(0);
final var copiedPage = targetDoc.importPage(sourcePage);
copiedPage.setResources(sourcePage.getResources());

在 PDFBOX 3.0 中,它似乎不再起作用,如果您在 Adobe Acrobat 中打开文档,文档就会损坏。

如果您使用 PDFBOX PreflightParser 打开它,它会显示很多错误。

这里是预检解析器的错误消息:

1.4 Trailer Syntax error, /XRef cross reference streams are not allowed
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.1.3 Invalid Font definition, BCDFEE+Roboto-Regular: FontFile entry is missing from FontDescriptor
3.1.14 Invalid Font definition, Unknown font type: XML
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.1.8 Invalid Font definition
3.1.2 Invalid Font definition, BCDGEE+TimesNewRomanPS-BoldMT: some mandatory fields are missing from the FontDescriptor: Type, ItalicAngle, FontBBox, Ascent, FontName, StemV, Flags, CapHeight, Descent.
3.1.3 Invalid Font definition, null: FontFile entry is missing from FontDescriptor
3.3.2 Glyph error, invalid font dictionary ==> 

这里是使用 PDFBox 3.0.1 的完整测试用例

   @Test
    void importPageWithFonts_validateFontInfo() throws IOException {
        // given
        final var targetDocBytes = IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream("empty.pdf"));

        String[] additionalFiles = new String[]{
            "roboto-14.pdf",
        };
        PDDocument targetDoc = Loader.loadPDF(targetDocBytes);


        // when
        for (String fileName : Arrays.asList(additionalFiles)) {
            byte[] data = IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream(fileName));
            // verify source is valid
            PDPage sourcePage = Loader.loadPDF(data).getPage(0);
            final var copiedPage = targetDoc.importPage(sourcePage);
            copiedPage.setResources(sourcePage.getResources());
            targetDoc.save(Files.createTempFile("merged-fonts", ".pdf").toFile());
        }
        Path tmpFile = Files.createTempFile("fscd-merged", ".pdf");
        targetDoc.save(tmpFile.toFile(), CompressParameters.DEFAULT_COMPRESSION);


        // then
        // font errors, e.g. Invalid Font definition, BCDFEE+Roboto-Regular: FontFile entry is missing from FontDescriptor
        assertFontsAreValid(tmpFile);
    }

    private static void assertFontsAreValid(Path tmpFile) throws IOException {
        PreflightParser parser = new PreflightParser(tmpFile.toFile());
        final var documentToVerify = (PreflightDocument) parser.parse();
        // Get validation result
        final var result = documentToVerify.validate();
        final var resultString = result.getErrorsList().stream()
            .filter(err -> !err.getErrorCode()
                .matches("7\\.11\\.2|3\\.1\\.11|2\\.1\\.2|2\\.2\\.1|2\\.4\\.3")) // filter findings from the source documents
            .map(err -> err.getErrorCode() + " " + err.getDetails()).collect(Collectors.joining("\n"));
        assertTrue(resultString.isBlank(), resultString);
    }
java pdf pdf-generation pdfbox
2个回答
0
投票

这是 pdfbox 3.0.1 版本中的错误。请查看JIRA-ticket了解更多详情。


0
投票

我想我可能有同样的问题(我尝试添加文本)。

我尝试了pdfbox-3.0.3-20240407.043756-30.jar,问题仍然存在。我也检查了 4.0 - 相同的结果。

© www.soinside.com 2019 - 2024. All rights reserved.