如何区分背景颜色和文字颜色?

问题描述 投票:0回答:1

如果背景颜色和文本颜色相同,谁能帮我如何区分 pdf 文档的背景颜色和文本颜色。

实际上我需要为不可见文本设置一些静态颜色 使用 pdfbox 以便我们可以看到不可见的文本。

TextObjectInfo 包含使用 PDFStreamEngine 的所有文本对象信息。

public class SimplePdfRegeneretor {
    private PDDocument _document;
    private PDResources _pageResource;
    private PDFTextObjectInfoExtraction _PDFTextObjectInfoExtraction;
    private List<List<TextObjectInfo>> _documentTextObjectInfo;
    
    private void RecreatePDF() throws IOException{
        int _pageNo = 0;
        for (PDPage page : _document.getPages())
        {
            List<TextObjectInfo> _pageTextObjectInfo = this._documentTextObjectInfo.get(_pageNo);
             try (PDPageContentStream contentStream = new PDPageContentStream(_document,
                    page, AppendMode.APPEND, false, true)){
                    Integer _textObjInfoInx = 0 ;
                    //contentStream.setNonStrokingColor(0,0,0,0);
                    for(TextObjectInfo _textObjInfo : _pageTextObjectInfo){  
                    Float _xmin = _textObjInfo.get_xyminmax().get(0);
                    Float _ymin = _textObjInfo.get_xyminmax().get(1); 
                    putTextOnDocument(contentStream,_textObjInfo,_textObjInfo.TextFontObject,_xmin,_ymin,_textObjInfoInx);
                    _textObjInfoInx++; 
               }
               
            }
            _pageNo++;
        }

        _pageNo = 0;
        for (PDPage _page : _document.getPages())
            {
                List<Object> newTokens = addTjStringtoContenStream(_page,_pageNo);
                PDStream newContents = new PDStream(_document);
                writeTokensToStream(newContents, newTokens);
                _page.setContents(newContents);
                System.out.println("Page TextObject Writting Completed.."+_pageNo);
                _pageNo++;
            }

    }
    
    private void putTextOnDocument(PDPageContentStream contentStream, TextObjectInfo _textObjInfo, PDFont font, Float horizontalPixel, 
                                                                   Float verticalPixel, int TextObjectIndex) throws IOException {
        
        String _textobjstr = "TextObjectIndex-" + TextObjectIndex;
        Matrix _tm = _textObjInfo.textMatrixs.get(_textObjInfo.textMatrixs.size()-1);
        int fontSize = _textObjInfo.TextFontSize.intValue();
        PDGraphicsState _GraphicsState = _textObjInfo.getGraphicsState();
        PDTextState _TextState = _GraphicsState.getTextState();
        
        contentStream.beginText();
        contentStream.setNonStrokingColor(_GraphicsState.getNonStrokingColor());
        contentStream.setStrokingColor(_GraphicsState.getStrokingColor());
        contentStream.setRenderingMode(_TextState.getRenderingMode());
        contentStream.setFont(font, fontSize);
        contentStream.setTextMatrix(_tm);
        contentStream.beginMarkedContent(COSName.getPDFName(_textobjstr));
        contentStream.endMarkedContent();
        contentStream.endText();
    }

    private List<Object> addTjStringtoContenStream(PDContentStream contentStream, int _pgInx) throws IOException{
        PDFStreamParser parser = new PDFStreamParser(contentStream);
        Object token = parser.parseNextToken();
        List<Object> newTokens = new ArrayList<>();
        List<TextObjectInfo> _pageTextObjInfo =  this._documentTextObjectInfo.get(_pgInx);
        System.out.println("Len of _pageTextObjInfo: "+_pageTextObjInfo.size());
        //newTokens.add(Operator.getOperator("q"));
        while (token != null)
        {
            if (token instanceof Operator)
            {
                Operator op = (Operator) token;
                String opName = op.getName();
                if (OperatorName.BEGIN_MARKED_CONTENT.equals(opName))
                {
                    // remove the argument to this operator
                    //System.out.println(newTokens.get(newTokens.size() - 1));
                    Integer _tjObjInx = Integer.parseInt(((COSName)newTokens.get(newTokens.size() - 1)).getName().replace("TextObjectIndex-", ""));
                    
                    TextObjectInfo _TextObjectInfo = _pageTextObjInfo.get(_tjObjInx);
                    COSString _tjStr = _TextObjectInfo.TjString;
                    newTokens.remove(newTokens.size() - 1);
                    newTokens.add(_tjStr);
                    newTokens.add(Operator.getOperator("Tj"));
                    token = parser.parseNextToken();
                    continue;
                }
                else if (OperatorName.END_MARKED_CONTENT.equals(opName))
                {
                    token = parser.parseNextToken();
                    continue;
                }
            }
            newTokens.add(token);
            token = parser.parseNextToken();
        }
        //newTokens.add(Operator.getOperator("Q"));
        return newTokens;
    }
    private static void writeTokensToStream(PDStream newContents, List<Object> newTokens) throws IOException
    {
        try (OutputStream out = newContents.createOutputStream(COSName.FLATE_DECODE))
        {
            ContentStreamWriter writer = new ContentStreamWriter(out);
            writer.writeTokens(newTokens);
        }
    }
}

1.这是背景图片

2.这是没有背景图像的黑色文本

3.这是文本可见的输入文档

4.这是文本不可见的输出,即银色

java pdfbox
1个回答
0
投票

我使用的是 pdfbox 2.0+ 版本,因此我在覆盖的 PDFStreamEngine 的构造函数中添加了以下运算符:

addOperator(new SetStrokingColorSpace());
addOperator(new SetNonStrokingColorSpace());
addOperator(new SetStrokingDeviceCMYKColor());
addOperator(new SetNonStrokingDeviceCMYKColor());
addOperator(new SetNonStrokingDeviceRGBColor());
addOperator(new SetStrokingDeviceRGBColor());
addOperator(new SetNonStrokingDeviceGrayColor());
addOperator(new SetStrokingDeviceGrayColor());
addOperator(new SetStrokingColor());
addOperator(new SetStrokingColorN());
addOperator(new SetNonStrokingColor());
addOperator(new SetNonStrokingColorN());

然后从这个getGraphicsState()中提取出需要的信息。 请特别查看此https://pdfbox.apache.org/2.0/migration.html 文本提取部分.

© www.soinside.com 2019 - 2024. All rights reserved.