Itext:用条形码分隔符分割pdf文档

问题描述 投票:0回答:1

我面临以下用例:

我收到一份包含许多文件的pdf。每个文档都有不同的页面数。它们由条形码页面分隔。

是否可以拆分包含多个文档的多页PDF,这些文档由带有条形码的页面分隔,并创建新的pdf,每个文档一个?

我读到我们可以用Itext分割pdf:https://developers.itextpdf.com/examples/stamping-content-existing-pdfs/clone-splitting-pdf-file

但是当我检测到条形码页面时,我没有在网上找到分割它的方法。

更新:@mkl我已经找到了如何使用zxing从QR码读取文本:它适用于简单的png文件

File QRfile = new File("test.png");

BufferedImage bufferedImg = ImageIO.read(QRfile);
LuminanceSource source = new BufferedImageLuminanceSource(bufferedImg);
BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source));

Result result = new MultiFormatReader().decode(bitmap);

System.out.println("Barcode Format: " + result.getBarcodeFormat());
                        System.out.println("Content: " + result.getText());

但它在循环中不起作用。我用pdf文件测试(7页)

这里JAVA代码:

PdfDocument pdfDoc;
pdfDoc = new PdfDocument(new PdfReader(pathName));
logger.debug("pdfDoc OK"); 
PdfDocumentContentParser contentParser = new PdfDocumentContentParser(pdfDoc);
for (int page = 1; page <= pdfDoc.getNumberOfPages(); page++)
{
    logger.debug("page: " + page); 
    contentParser.processContent(page, new IEventListener()
    {
        @Override
        public Set<EventType> getSupportedEvents()
        {
            logger.debug("inside getSupportedEvents"); 
            return Collections.singleton(RENDER_IMAGE);
        }

        @Override
        public void eventOccurred(IEventData data, EventType type)
        {
            index = index + 1;
            logger.debug("inside eventOccurred - data: " + data);
            logger.debug("inside eventOccurred - type: " + type);
            logger.debug("inside eventOccurred - index: " + index);
            if (data instanceof ImageRenderInfo)
            {
                logger.debug("data instanceof ImageRenderInfo"); 
                ImageRenderInfo imageRenderInfo = (ImageRenderInfo) data;
                byte[] bytes = imageRenderInfo.getImage().getImageBytes();
                try
                {
                    logger.debug("avant Files writer");
                    String pngName = "C:/alfresco/klinck/splitImage-" + index + ".png";
                    logger.debug("pngName: " + pngName);
                    Files.write(new File(pngName).toPath(), bytes);
                    logger.debug("Files written");
                    File QRfile = new File(pngName);
                    logger.debug("QR File trouvé ! ");
                    BufferedImage bufferedImg = ImageIO.read(QRfile);
                    logger.debug("bufferedImg OK ");
                    LuminanceSource source = new BufferedImageLuminanceSource(bufferedImg);
                    logger.debug("source OK ");
                    BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source));
                    logger.debug("bitmap OK");
                    Result result = new MultiFormatReader().decode(bitmap);
                    logger.debug("SplitFluxJobExcecuter - resultBarcodeFormat: " + result.getBarcodeFormat());
                    logger.debug("SplitFluxJobExcecuter - result.getText(): " + result.getText());
                }catch (Exception e)
                {
                   logger.error("SplitJobExecuter Exception : " + ExceptionUtils.getStackTrace(e));
                }
            }
        }
        int index = 0;

        });
    }

第一页包含3张图片(1张QR码)。我在上一步中收到“com.google.zxing.NotFoundException”。

这是日志:

2018-07-25 16:27:00,227 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pdfDoc OK
2018-07-25 16:27:00,227 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] page: 1
2018-07-25 16:27:00,237 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside getSupportedEvents

2018-07-25 16:27:00,265 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - data: com.itextpdf.kernel.pdf.canvas.parser.data.ImageRenderInfo@2472ac79
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - type: RENDER_IMAGE
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - index: 1
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] data instanceof ImageRenderInfo
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] avant Files writer
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pngName: C:/alfresco/klinck/splitImage-1.png
2018-07-25 16:27:00,270 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] Files written
2018-07-25 16:27:00,270 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] QR File trouvé ! 
2018-07-25 16:27:00,304 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bufferedImg OK 
2018-07-25 16:27:00,305 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] source OK 
2018-07-25 16:27:00,306 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bitmap OK
2018-07-25 16:27:00,407 ERROR [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] SplitJobExecuter Exception : com.google.zxing.NotFoundException

2018-07-25 16:27:00,407 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - data: com.itextpdf.kernel.pdf.canvas.parser.data.ImageRenderInfo@6e036aea
2018-07-25 16:27:00,407 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - type: RENDER_IMAGE
2018-07-25 16:27:00,407 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - index: 2
2018-07-25 16:27:00,407 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] data instanceof ImageRenderInfo
2018-07-25 16:27:00,408 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] avant Files writer
2018-07-25 16:27:00,408 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pngName: C:/alfresco/klinck/splitImage-2.png
2018-07-25 16:27:00,411 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] Files written
2018-07-25 16:27:00,411 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] QR File trouvé ! 
2018-07-25 16:27:00,415 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bufferedImg OK 
2018-07-25 16:27:00,415 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] source OK 
2018-07-25 16:27:00,415 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bitmap OK
2018-07-25 16:27:00,473 ERROR [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] SplitJobExecuter Exception : com.google.zxing.NotFoundException

2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - data: com.itextpdf.kernel.pdf.canvas.parser.data.ImageRenderInfo@4c205db7
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - type: RENDER_IMAGE
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - index: 3
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] data instanceof ImageRenderInfo
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] avant Files writer
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pngName: C:/alfresco/klinck/splitImage-3.png
2018-07-25 16:27:00,478 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] Files written
2018-07-25 16:27:00,478 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] QR File trouvé ! 
2018-07-25 16:27:00,479 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bufferedImg OK 
2018-07-25 16:27:00,479 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] source OK 
2018-07-25 16:27:00,479 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bitmap OK
2018-07-25 16:27:00,484 ERROR [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] SplitJobExecuter Exception : com.google.zxing.NotFoundException

从第2页到第7页,错误消息不同:

2018-07-25 16:27:00,487 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] page: 2
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside getSupportedEvents
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - data: com.itextpdf.kernel.pdf.canvas.parser.data.ImageRenderInfo@6d41ffa2
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - type: RENDER_IMAGE
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - index: 1
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] data instanceof ImageRenderInfo
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] avant Files writer
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pngName: C:/alfresco/klinck/splitImage-1.png
2018-07-25 16:27:00,492 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] Files written
2018-07-25 16:27:00,493 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] QR File trouvé ! 
2018-07-25 16:27:00,493 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bufferedImg OK 
2018-07-25 16:27:00,493 ERROR [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] SplitJobExecuter Exception : java.lang.NullPointerException
    at com.google.zxing.client.j2se.BufferedImageLuminanceSource.<init>(BufferedImageLuminanceSource.java:42)
    at com.klinck.mc.jobs.SplitFluxJobExecuter$1.eventOccurred(SplitFluxJobExecuter.java:150)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.eventOccurred(PdfCanvasProcessor.java:534)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.displayImage(PdfCanvasProcessor.java:573)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.access$5800(PdfCanvasProcessor.java:108)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$ImageXObjectDoHandler.handleXObject(PdfCanvasProcessor.java:1420)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.displayXObject(PdfCanvasProcessor.java:566)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.access$5600(PdfCanvasProcessor.java:108)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$DoOperator.invoke(PdfCanvasProcessor.java:1285)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.invokeOperator(PdfCanvasProcessor.java:452)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processContent(PdfCanvasProcessor.java:281)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processPageContent(PdfCanvasProcessor.java:302)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:77)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:90)
    at com.klinck.mc.jobs.SplitFluxJobExecuter.execute(SplitFluxJobExecuter.java:118)
    at com.klinck.mc.jobs.SplitFluxJob$1.doWork(SplitFluxJob.java:27)
    at org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(AuthenticationUtil.java:555)
    at com.klinck.mc.jobs.SplitFluxJob.executeJob(SplitFluxJob.java:24)
    at org.alfresco.schedule.ScheduledJobLockExecuter.execute(ScheduledJobLockExecuter.java:94)
    at org.alfresco.schedule.AbstractScheduledLockedJob.executeInternal(AbstractScheduledLockedJob.java:72)
    at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:114)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:563)

更新2

我认为出现错误消息“com.google.zxing.NotFoundException”,因为图片不包含文本消息或太大:com.google.zxing.NotFoundException exception comes when core java program executed?

pdf split itext barcode
1个回答
0
投票

它适用于我的方法如下:

步骤1:

检测特定QR码并将页码存储在列表中:

PdfDocument pdfDoc;
pdfDoc = new PdfDocument(new PdfReader(pathName));
logger.debug("pdfDoc OK");
PdfDocumentContentParser contentParser = new PdfDocumentContentParser(pdfDoc);
List<Integer> pageList = new ArrayList<Integer>();
int[] currentPage = new int[1];
for ( int page = 1; page <= pdfDoc.getNumberOfPages(); page++) {
   currentPage[0] = page;
   contentParser.processContent(page, new IEventListener() {
   @Override
   public Set<EventType> getSupportedEvents() { 
        return Collections.singleton(RENDER_IMAGE);
   }

   @Override
   public void eventOccurred(IEventData data, EventType type) {
        index = index + 1;
        if (data instanceof ImageRenderInfo) {
            logger.debug("data instanceof ImageRenderInfo"); 
            ImageRenderInfo imageRenderInfo = (ImageRenderInfo) data;
            byte[] bytes = imageRenderInfo.getImage().getImageBytes();
            String pngName = coreServices.getSplitFolderTemp() +"Page-" + currentPage[0] +  "_Image-" + index + ".png";
            logger.debug("pngName: " + pngName);
            File image = new File(pngName);
            try {
                // le QR code KLINCK est stocké dans la première image de la feuille de séparation. 
                if (index == 1) {
                    // ZXING - > Read Data from QR Code
                    Files.write(new File(pngName).toPath(), bytes);
                    BufferedImage bufferedImg = ImageIO.read(image);
                    LuminanceSource source = new BufferedImageLuminanceSource(bufferedImg);
                    BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source));
                    Result result = new MultiFormatReader().decode(bitmap);
                    if (result.getBarcodeFormat().toString().equals("QR_CODE") && result.getText().toString().equals("SEPARATEUR")) {
                    // on stocke les numéros de pages des QR Code Klinck
                       pageList.add(currentPage[0]);
                       logger.debug("QR code Klinck trouvé en page: " + currentPage[0]);
                   }
                 }
            }
             catch (Exception e) {
            logger.error("l'image détectée n'est pas le QR Code Klinck : " + ExceptionUtils.getStackTrace(e));
         }
         if (image.delete())
            logger.debug("immage supprimée");
                                            }
    }
    int index = 0;
 });

}

第2步:创建pdfs

logger.debug("Création des PDFs");
if (pageList.size() == 0) {
    logger.debug("un seul document ");
    PdfDocument pdfDest = new PdfDocument(new PdfWriter("C:/alfresco/klinck/onePdf.pdf"));
    pdfDoc.copyPagesTo(1,pdfDoc.getNumberOfPages(), pdfDest);
    pdfDest.close();
} else {
    // 2) Un ou plusieurs QR code = au moins deux documents
    logger.debug("longueur liste: " + pageList.size());
    int start = 1;
    for (int index = 0; index < pageList.size(); index++) {
        logger.debug("QR Code Klinck trouvé en page " + pageList.get(index) );
        logger.debug("Prochain document , page " + start + " à " + pageList.get(index) + "- 1");
        // la 1ère page du document initial ne doit pas être un séparateur
        if (pageList.get(index) != 1) {
            PdfDocument pdfDest = new PdfDocument(new PdfWriter("C:/alfresco/klinck/splitPdf-" + start + ".pdf"));
            pdfDoc.copyPagesTo(start,pageList.get(index)-1, pdfDest);
            pdfDest.close();
        }
        start = pageList.get(index) + 1;
    }

    // gestion du dernier document
    PdfDocument pdfDest = new PdfDocument(new PdfWriter("C:/alfresco/klinck/splitPdf-" + start + ".pdf"));
    pdfDoc.copyPagesTo(start, pdfDoc.getNumberOfPages(), pdfDest);
    pdfDest.close();

}

pdfDoc.close();
最新问题
© www.soinside.com 2019 - 2024. All rights reserved.