我有一个Java类
Split
,它负责根据页面范围将PDF文件分割成多个部分。该类使用 PDFBox 来实现此目的。此外,我有一个 PDFModel
类来管理生成的 PDF 文件,还有一个 Range
类来指定页面范围。
这是
Split
课程:
public class Split{
private Logger logger;
private File inputFile;
private PDFModel pdfModel;
private File outputDirectory;
public Split(Logger logger, File inputFile, File outputDirectory) {
// Constructor logic...
}
/**
* Splits a PDF file based on a list of page ranges and saves the resulting partial PDFs.
*
* @param ranges A list of page ranges specifying which pages to split from the input PDF.
* @return An ArrayList of PDFModel objects representing the resulting partial PDFs.
*/
public ArrayList<PDFModel> splitByRanges(ArrayList<Range> ranges){
ArrayList<PDFModel> results = new ArrayList<>();
for (int i = 0; i < ranges.size(); i++) {
PDDocument partial = split(ranges.get(i));
if(partial == null) {
continue;
}
File outputFile = new File(Paths.get(outputDirectory.getAbsolutePath(), "file_" + i + ".pdf").toString());
try {
partial.save(outputFile);
results.add(new PDFModel(outputFile, partial));
logger.info(this, "Successfully splitted '" + inputFile + "' from page " + ranges.get(i).getFrom() + " to " + ranges.get(i).getTo() + " into '" + outputFile.getAbsolutePath() + "'");
} catch (IOException e) {
e.printStackTrace();
}
}
return results;
}
private PDDocument split(Range range) {
PDDocument result = new PDDocument();
int fromPage = range.getFrom();
int toPage = range.getTo();
// Get the PDPageTree from the PDDocument
PDPageTree pdPageTree = pdfModel.getPDDocument().getPages();
if (fromPage <= 0 || toPage <= 0 || fromPage > toPage || toPage > pdPageTree.getCount()) {
logger.warning(this, "Invalid page range for splitting.");
return null;
}
for (int i = range.getFrom() -1; i < range.getTo(); i++) {
result.addPage(pdPageTree.get(i));
}
return result;
}
}
org.apache.pdfbox.multipdf.Splitter
的作用相同,但也不起作用。
private PDDocument split(Range range) {
int fromPage = range.getFrom();
int toPage = range.getTo() ;
PDDocument pddocument = pdfModel.getPDDocument();
Splitter splitter = new Splitter();
splitter.setStartPage(fromPage);
splitter.setEndPage(toPage);
splitter.setSplitAtPage(toPage - fromPage +1 );
List<PDDocument> lst = null;
try {
lst = splitter.split(pddocument);
} catch (IOException e) {
e.printStackTrace();
}
return lst.get(0);
}
PDFModel
班:
public class PDFModel {
private File file;
private PDDocument pdDocument;
private ArrayList<PDFImage> images;
private ArrayList<String> pages;
public PDFModel(File file, PDDocument pdDocument) {
// Constructor logic...
}
}
Range
班:
public class Range {
private int from;
private int to;
public Range(int from, int to) {
// Constructor logic...
}
}
我正在尝试使用这个
Split
类使用以下代码将 PDF 文件拆分为多个部分:
这会引发错误:
Splitter splitter = new Splitter(logger, inputFile, outputDirectory);
splitter.splitByRanges(new ArrayList<Range>(Arrays.asList(new Range(1, 7), new Range(8, 9), new Range(10, 11)));
这工作得很好(不适合
org.apache.pdfbox.multipdf.Splitter
):
Splitter splitter = new Splitter(logger, inputFile, outputDirectory);
splitter.splitByRanges(new ArrayList<Range>(Arrays.asList(new Range(1, 8), new Range(10, 12), new Range(14, 16)));
但是,我遇到了以下 StackOverflowError:
Exception in thread "main" java.lang.StackOverflowError
at java.base/java.util.HashMap.tableSizeFor(HashMap.java:378)
at java.base/java.util.HashMap.<init>(HashMap.java:455)
at java.base/java.util.LinkedHashMap.<init>(LinkedHashMap.java:439)
at java.base/java.util.HashSet.<init>(HashSet.java:171)
at java.base/java.util.LinkedHashSet.<init>(LinkedHashSet.java:167)
at org.apache.pdfbox.util.SmallMap.entrySet(SmallMap.java:384)
at org.apache.pdfbox.cos.COSDictionary.entrySet(COSDictionary.java:1232)
at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:338)
at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:232)
at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:343)
at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:232)
at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:321)
at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:228)
如何解决此 StackOverflow 错误?
问题似乎出在
pdfbox
这边,所以这里只是 3.0.0 版本的解决方法
private PDDocument split(Range range) {
PDDocument pdDocument = new PDDocument();
for (PDPage pdPage : pdfModel.getPDDocument().getPages()) {
pdDocument.addPage(pdPage);
}
int fromPage = range.getFrom();
int toPage = range.getTo();
int pageCount = pdDocument.getNumberOfPages();
if (fromPage > 0 && toPage > 0 && pageCount >= fromPage && pageCount < toPage) {
logger.warning(this, "Invalid page range for splitting.");
return null;
}
System.out.println("Page count: " + pdDocument.getNumberOfPages());
for (int n = pageCount - 1; n >= toPage; n--) {
pdDocument.removePage(n);
}
for (int n = fromPage -2; n >= 0; n--) {
pdDocument.removePage(n);
}
return pdDocument;
}
最好使用
org.apache.pdfbox
版本> 3.0.0或更高版本3.0.1及以上此问题解决了该错误。