我尝试将导入页面合并到pdf文档中并复制字体资源。使用 PDFBOX 2.0,代码工作得非常好,正如预期的那样,有一个包含所需嵌入字体的结果文档。
本质上我是在代码中执行这些步骤,而第一个文档是一个空页PDF/A,第二个文档包含roboto字体PDF/A。所有字体均已嵌入。
PDDocument targetDoc = Loader.loadPDF(targetDocBytes);
PDPage sourcePage = Loader.loadPDF(data).getPage(0);
final var copiedPage = targetDoc.importPage(sourcePage);
copiedPage.setResources(sourcePage.getResources());
在 PDFBOX 3.0 中,它似乎不再起作用,如果您在 Adobe Acrobat 中打开文档,文档就会损坏。
如果您使用 PDFBOX PreflightParser 打开它,它会显示很多错误。
这里是预检解析器的错误消息:
1.4 Trailer Syntax error, /XRef cross reference streams are not allowed
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.1.3 Invalid Font definition, BCDFEE+Roboto-Regular: FontFile entry is missing from FontDescriptor
3.1.14 Invalid Font definition, Unknown font type: XML
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.1.8 Invalid Font definition
3.1.2 Invalid Font definition, BCDGEE+TimesNewRomanPS-BoldMT: some mandatory fields are missing from the FontDescriptor: Type, ItalicAngle, FontBBox, Ascent, FontName, StemV, Flags, CapHeight, Descent.
3.1.3 Invalid Font definition, null: FontFile entry is missing from FontDescriptor
3.3.2 Glyph error, invalid font dictionary ==>
这里是使用 PDFBox 3.0.1 的完整测试用例
@Test
void importPageWithFonts_validateFontInfo() throws IOException {
// given
final var targetDocBytes = IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream("empty.pdf"));
String[] additionalFiles = new String[]{
"roboto-14.pdf",
};
PDDocument targetDoc = Loader.loadPDF(targetDocBytes);
// when
for (String fileName : Arrays.asList(additionalFiles)) {
byte[] data = IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream(fileName));
// verify source is valid
PDPage sourcePage = Loader.loadPDF(data).getPage(0);
final var copiedPage = targetDoc.importPage(sourcePage);
copiedPage.setResources(sourcePage.getResources());
targetDoc.save(Files.createTempFile("merged-fonts", ".pdf").toFile());
}
Path tmpFile = Files.createTempFile("fscd-merged", ".pdf");
targetDoc.save(tmpFile.toFile(), CompressParameters.DEFAULT_COMPRESSION);
// then
// font errors, e.g. Invalid Font definition, BCDFEE+Roboto-Regular: FontFile entry is missing from FontDescriptor
assertFontsAreValid(tmpFile);
}
private static void assertFontsAreValid(Path tmpFile) throws IOException {
PreflightParser parser = new PreflightParser(tmpFile.toFile());
final var documentToVerify = (PreflightDocument) parser.parse();
// Get validation result
final var result = documentToVerify.validate();
final var resultString = result.getErrorsList().stream()
.filter(err -> !err.getErrorCode()
.matches("7\\.11\\.2|3\\.1\\.11|2\\.1\\.2|2\\.2\\.1|2\\.4\\.3")) // filter findings from the source documents
.map(err -> err.getErrorCode() + " " + err.getDetails()).collect(Collectors.joining("\n"));
assertTrue(resultString.isBlank(), resultString);
}
这是 pdfbox 3.0.1 版本中的错误。请查看JIRA-ticket了解更多详情。
我想我可能有同样的问题(我尝试添加文本)。
我尝试了pdfbox-3.0.3-20240407.043756-30.jar,问题仍然存在。我也检查了 4.0 - 相同的结果。