PDFBox v3.0.0 分割PDF文件时出现StackOverflowError

问题描述 投票:0回答:1

我有一个Java类

Split
,它负责根据页面范围将PDF文件分割成多个部分。该类使用 PDFBox 来实现此目的。此外,我有一个
PDFModel
类来管理生成的 PDF 文件,还有一个
Range
类来指定页面范围。

这是

Split
课程:

public class Split{
    private Logger logger;
    private File inputFile;
    private PDFModel pdfModel;
    private File outputDirectory;

    public Split(Logger logger, File inputFile, File outputDirectory) {
        // Constructor logic...
    }

    /**
     * Splits a PDF file based on a list of page ranges and saves the resulting partial PDFs.
     *
     * @param ranges A list of page ranges specifying which pages to split from the input PDF.
     * @return An ArrayList of PDFModel objects representing the resulting partial PDFs.
     */
    public ArrayList<PDFModel> splitByRanges(ArrayList<Range> ranges){
        ArrayList<PDFModel> results = new ArrayList<>();
        
        for (int i = 0; i < ranges.size(); i++) {
            PDDocument partial = split(ranges.get(i));
            
            if(partial == null) {
                continue;
            }
            
            File outputFile = new File(Paths.get(outputDirectory.getAbsolutePath(), "file_" + i + ".pdf").toString());
            
            try {
                partial.save(outputFile);
                results.add(new PDFModel(outputFile, partial));
                
                logger.info(this, "Successfully splitted '" + inputFile + "' from page " + ranges.get(i).getFrom() + " to " + ranges.get(i).getTo() + " into '" + outputFile.getAbsolutePath() + "'");  
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        
        return results;
    }

    private PDDocument split(Range range) {
        PDDocument result = new PDDocument();

        int fromPage = range.getFrom();
        int toPage = range.getTo();
        
         // Get the PDPageTree from the PDDocument
        PDPageTree pdPageTree = pdfModel.getPDDocument().getPages();
        
        if (fromPage <= 0 || toPage <= 0 || fromPage > toPage || toPage > pdPageTree.getCount()) {
            logger.warning(this, "Invalid page range for splitting.");
            return null;
        }
        
        for (int i = range.getFrom() -1; i < range.getTo(); i++) {
            result.addPage(pdPageTree.get(i));
        }
        
        return result;
    }

}

org.apache.pdfbox.multipdf.Splitter
的作用相同,但也不起作用。

private PDDocument split(Range range) {
    int fromPage = range.getFrom();
    int toPage = range.getTo() ;

    PDDocument pddocument = pdfModel.getPDDocument();

    Splitter splitter = new Splitter();

    splitter.setStartPage(fromPage);
    splitter.setEndPage(toPage);
    splitter.setSplitAtPage(toPage - fromPage +1 );

    List<PDDocument> lst = null;
    
    try {
        lst = splitter.split(pddocument);
    } catch (IOException e) {
        e.printStackTrace();
    }

    return lst.get(0);
}

PDFModel
班:

public class PDFModel {
    private File file;
    private PDDocument pdDocument;
    private ArrayList<PDFImage> images;
    private ArrayList<String> pages;

    public PDFModel(File file, PDDocument pdDocument) {
        // Constructor logic...
    }
}

Range
班:

public class Range {
    private int from;
    private int to;

    public Range(int from, int to) {
        // Constructor logic...
    }
}

我正在尝试使用这个

Split
类使用以下代码将 PDF 文件拆分为多个部分:

这会引发错误:

Splitter splitter = new Splitter(logger, inputFile, outputDirectory);
splitter.splitByRanges(new ArrayList<Range>(Arrays.asList(new Range(1, 7), new Range(8, 9), new Range(10, 11)));

这工作得很好(不适合

org.apache.pdfbox.multipdf.Splitter
):

Splitter splitter = new Splitter(logger, inputFile, outputDirectory);
splitter.splitByRanges(new ArrayList<Range>(Arrays.asList(new Range(1, 8), new Range(10, 12), new Range(14, 16)));

但是,我遇到了以下 StackOverflowError:

Exception in thread "main" java.lang.StackOverflowError
    at java.base/java.util.HashMap.tableSizeFor(HashMap.java:378)
    at java.base/java.util.HashMap.<init>(HashMap.java:455)
    at java.base/java.util.LinkedHashMap.<init>(LinkedHashMap.java:439)
    at java.base/java.util.HashSet.<init>(HashSet.java:171)
    at java.base/java.util.LinkedHashSet.<init>(LinkedHashSet.java:167)
    at org.apache.pdfbox.util.SmallMap.entrySet(SmallMap.java:384)
    at org.apache.pdfbox.cos.COSDictionary.entrySet(COSDictionary.java:1232)
    at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:338)
    at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:232)
    at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:343)
    at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:232)
    at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:321)
    at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:228)

问题

如何解决此 StackOverflow 错误?

java stack-overflow pdfbox
1个回答
0
投票

解决方案1

问题似乎出在

pdfbox
这边,所以这里只是 3.0.0 版本的解决方法

private PDDocument split(Range range) {
    PDDocument pdDocument = new PDDocument();
    
    for (PDPage pdPage : pdfModel.getPDDocument().getPages()) {
        pdDocument.addPage(pdPage);
    }
    
    int fromPage = range.getFrom();
    int toPage = range.getTo();
    
    int pageCount = pdDocument.getNumberOfPages();

    if (fromPage > 0 && toPage > 0 && pageCount >= fromPage && pageCount < toPage) {
        logger.warning(this, "Invalid page range for splitting.");
        return null;
    }
    
    System.out.println("Page count: " + pdDocument.getNumberOfPages());
            
    for (int n = pageCount - 1; n >= toPage; n--) {
        pdDocument.removePage(n);
    }
    
    for (int n = fromPage -2; n >= 0; n--) {
        pdDocument.removePage(n);
    }
    
    return pdDocument;
}

解决方案2

最好使用

org.apache.pdfbox
版本> 3.0.0或更高版本3.0.1及以上此问题解决了该错误。

© www.soinside.com 2019 - 2024. All rights reserved.