我正在使用PDFBox(v2.0.13)来合并PDF文件。 这些文件是
合并后的文件是
我可以删除空白,这将使第2页成为第1页吗? 关于合并代码,我使用pdfbox github示例代码:https://github.com/apache/pdfbox/blob/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/PDFMergerExample.java html中的表格及其父元素的边距和填充为0.代码如下
<div class="table-wrap">
<table id="arOpenItemDetail_save" border="0" cellspacing="1" cellpadding="1" class="table-Y" name="detail">
<THEAD style="display:table-header-group;font-weight:bold" name="detailHeader">
<tr>
<th>Cust#</th>
<th width="20">Order Type</th>
<th>Order No</th>
<th>Doc Terms</th>
<th>Doc Date</th>
<th>Due Date</th>
<th>Days PastDue</th>
<th>Doc Amount</th>
<th>Current</th>
<th>1~30</th>
<th>30+</th>
<th>Ref</th>
<th>Ref2</th>
<th>Reason Code</th></tr>
</THEAD>
<span th:each="detail:${list}">
<tr class="odd">
<td align="right" width="20" th:text="${detail.custNo}">1</td>
<td align="center" width="20" th:text="${detail.custNo}">1</td>
<td align="right" th:text="${detail.custNo}">1</td>
<td align="center" th:text="${detail.custNo}">1</td>
<td align="right" th:text="${detail.custNo}">1</td>
<td align="right" th:text="${detail.custNo}">1</td>
<td align="right" th:text="${detail.custNo}"></td>
<td align="right" th:text="${detail.custNo}"></td>
<td align="right" th:text="${detail.custNo}"></td>
<td align="right" th:text="${detail.custNo}"></td>
<td align="right" th:text="${detail.custNo}"></td>
<td align="left" th:text="${detail.custNo}"></td>
<td align="left" th:text="${detail.custNo}"></td>
<td align="left" th:text="${detail.custNo}"></td>
</tr>
</span>
</table>
</div>
这个问题主要是关于从一个或多个PDF密集合并多个PDF页面。
通常,pdfs的合并方法仅在页面基础上合并,即它们从文档中合并页面并创建包含所有这些页面的新文档。通常,更密集的合并(将多个页面的内容放在单个结果页面上)是不可行的,因为在此上下文中必须识别和忽略页眉,页脚,背景图形和其他工件。对于像你这样的页面,密集合并是可行的,但还没有作为单一的实用方法提供。
可以像这样实现这样的实用程序类:
public class PdfDenseMergeTool {
public PdfDenseMergeTool(PDRectangle size, float top, float bottom, float gap)
{
this.pageSize = size;
this.topMargin = top;
this.bottomMargin = bottom;
this.gap = gap;
}
public void merge(OutputStream outputStream, Iterable<PDDocument> inputs) throws IOException
{
try
{
openDocument();
for (PDDocument input: inputs)
{
merge(input);
}
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
document.save(outputStream);
}
finally
{
closeDocument();
}
}
void openDocument() throws IOException
{
document = new PDDocument();
newPage();
}
void closeDocument() throws IOException
{
try
{
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
document.close();
}
finally
{
this.document = null;
this.yPosition = 0;
}
}
void newPage() throws IOException
{
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
currentPage = new PDPage(pageSize);
document.addPage(currentPage);
yPosition = pageSize.getUpperRightY() - topMargin + gap;
currentContents = new PDPageContentStream(document, currentPage);
}
void merge(PDDocument input) throws IOException
{
for (PDPage page : input.getPages())
{
merge(input, page);
}
}
void merge(PDDocument sourceDoc, PDPage page) throws IOException
{
PDRectangle pageSizeToImport = page.getCropBox();
BoundingBoxFinder boundingBoxFinder = new BoundingBoxFinder(page);
boundingBoxFinder.processPage(page);
Rectangle2D boundingBoxToImport = boundingBoxFinder.getBoundingBox();
double heightToImport = boundingBoxToImport.getHeight();
float maxHeight = pageSize.getHeight() - topMargin - bottomMargin;
if (heightToImport > maxHeight)
{
throw new IllegalArgumentException(String.format("Page %s content too large; height: %s, limit: %s.", page, heightToImport, maxHeight));
}
if (gap + heightToImport > yPosition - (pageSize.getLowerLeftY() + bottomMargin))
{
newPage();
}
yPosition -= heightToImport + gap;
LayerUtility layerUtility = new LayerUtility(document);
PDFormXObject form = layerUtility.importPageAsForm(sourceDoc, page);
currentContents.saveGraphicsState();
Matrix matrix = Matrix.getTranslateInstance(0, (float)(yPosition - (boundingBoxToImport.getMinY() - pageSizeToImport.getLowerLeftY())));
currentContents.transform(matrix);
currentContents.drawForm(form);
currentContents.restoreGraphicsState();
}
PDDocument document = null;
PDPage currentPage = null;
PDPageContentStream currentContents = null;
float yPosition = 0;
final PDRectangle pageSize;
final float topMargin;
final float bottomMargin;
final float gap;
}
(PdfDenseMergeTool实用类)
它使用BoundingBoxFinder
的this answer to an older question类。
你可以像这样使用PdfDenseMergeTool
:
PDDocument document1 = ...;
PDDocument document2 = ...;
PDDocument document3 = ...;
PDDocument document4 = ...;
PDDocument document5 = ...;
PdfDenseMergeTool tool = new PdfDenseMergeTool(PDRectangle.A4, 30, 30, 10);
tool.merge(new FileOutputStream("Merge with Text.pdf"),
Arrays.asList(document1, document2, document3, document4, document5,
document1, document2, document3, document4, document5,
document1, document2, document3, document4, document5));
连续三次合并五个源文档。
如果我的测试文档(每个源文档包含三行文本),我得到这个结果:
第1页:
第2页:
该实用程序类基本上是PdfDenseMergeTool
中iText的this answer的一个端口。
它已经使用当前的PDFBox 3.0.0开发分支SNAPSHOT进行了测试。