由于某种原因,第一页上的背景颜色不正确

问题描述 投票:1回答:1

文件示例:here问题:我正在尝试确定文本是否在页面上可见。为了达到这个目的,我为每个Fill命令保存了它的路径+颜色,如下所示:

    public class FillNonZeroRule extends OperatorProcessor {
        @Override
        public final void process(Operator operator, List<COSBase> operands) throws IOException {
            PDGraphicsState gs = getGraphicsState();    
            linePath.setWindingRule(GeneralPath.WIND_NON_ZERO);
            addFillPath(gs.getNonStrokingColor());
            linePath.reset();
        }

        @Override
        public String getName() {
            return "f";
        }
    }

    void addFillPath(PDColor color) {
        filledPaths.put((GeneralPath)linePath.clone(), color);
    }

而且,这是我后来为每个角色获得背景的方式:

private PDColor getCharacterBackgroundColor(TextPosition text) {
        PDColor color = null;           
        for (Map.Entry<GeneralPath, PDColor> filledPath : filledPaths.entrySet()) {
            Vector center = getTextPositionCenterPoint(text);
            if (filledPath.getKey().contains(lowerLeftX + center.getX(), lowerLeftY + center.getY())) {
                color = filledPath.getValue();                  
            }
        }

        return color;
    }

此外,还要为每个文本位置保存颜色。然后我尝试确定该背景颜色是否与字符颜色相同。有趣的是,首页背景颜色和标题的文本颜色(顶部带背景的行)都是2301728(int RGB值) - 这是不正确的,但是,对于第二页,文本颜色是2301728,背景颜色是14145754(正确!)。所以我的问题是在第一页上导致错误背景的原因...提前致谢!

全班如下:

public class PdfToTextInfoConverter extends PDFTextStripper {

    private int rotation = 0;

    private float lowerLeftX = 0;

    private float lowerLeftY = 0;

    private PDPage page = null;

    private GeneralPath linePath;

    private Map<GeneralPath, PDColor> filledPaths;

    private Map<TextPosition, PDColor> nonStrokingColors;

    public PdfToTextInfoConverter(PDDocument pddfDoc) throws IOException {
        addOperator(new SetStrokingColorSpace());
        addOperator(new SetNonStrokingColorSpace());
        addOperator(new SetNonStrokingColorN());
        addOperator(new SetStrokingColor());
        addOperator(new SetNonStrokingColor());
        addOperator(new SetStrokingDeviceGrayColor());
        addOperator(new SetNonStrokingDeviceGrayColor());
        addOperator(new SetStrokingDeviceRGBColor());
        addOperator(new SetNonStrokingDeviceRGBColor());
        addOperator(new SetStrokingDeviceCMYKColor());
        addOperator(new SetNonStrokingDeviceCMYKColor());

        addOperator(new AppendRectangleToPath());
        addOperator(new ClipEvenOddRule());
        addOperator(new ClipNonZeroRule());
        addOperator(new ClosePath());
        addOperator(new CurveTo());
        addOperator(new CurveToReplicateFinalPoint());
        addOperator(new CurveToReplicateInitialPoint());
        addOperator(new EndPath());
        addOperator(new FillEvenOddAndStrokePath());
        addOperator(new FillEvenOddRule());
        addOperator(new FillNonZeroAndStrokePath());
        addOperator(new FillNonZeroRule());
        addOperator(new LineTo());
        addOperator(new MoveTo());
        addOperator(new StrokePath());
        document = pddfDoc;
    }

    public void stripPage(int pageNum, int resolution) throws IOException {
        this.setStartPage(pageNum + 1);
        this.setEndPage(pageNum + 1);
        page = document.getPage(pageNum);
        rotation = page.getRotation();
        linePath = new GeneralPath();
        filledPaths = new LinkedHashMap<>();
        nonStrokingColors = new HashMap<>();    
        Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
        writeText(document, dummy); // This call starts the parsing process and calls writeString repeatedly.
    }

    @Override
    public void processPage(PDPage page) throws IOException {
        PDRectangle pageSize = page.getCropBox();

        lowerLeftX = pageSize.getLowerLeftX();
        lowerLeftY = pageSize.getLowerLeftY();

        super.processPage(page);
    }

    private Integer getCharacterBackgroundColor(TextPosition text) {
        Integer fillColorRgb = null;
        try {           
            for (Map.Entry<GeneralPath, PDColor> filledPath : filledPaths.entrySet()) {
                Vector center = getTextPositionCenterPoint(text);
                if (filledPath.getKey().contains(lowerLeftX + center.getX(), lowerLeftY + center.getY())) {
                    fillColorRgb = filledPath.getValue().toRGB();                   
                }
            }
        } catch (IOException e) {
            logger.error("Could not convert color to RGB", e);
        }
        return fillColorRgb;
    }

    private int getCharacterColor(TextPosition text) {
        int colorRgb = 0; // assume it's black even if we could not convert to RGB
        try {
            colorRgb = nonStrokingColors.get(text).toRGB();         
        } catch (IOException e) {
            logger.error("Could not convert color to RGB", e);
        }
        return colorRgb;
    }

    @Override
    protected void processTextPosition(TextPosition text) {
        PDGraphicsState gs = getGraphicsState();
        // check opacity for stroke and fill text 
        if (gs.getAlphaConstant() < Constants.EPSILON && gs.getNonStrokeAlphaConstant() < Constants.EPSILON) {
            return;
        }                       

        Vector center = getTextPositionCenterPoint(text);
        Area area = gs.getCurrentClippingPath();
        if (area == null || area.contains(lowerLeftX + center.getX(), lowerLeftY + center.getY())) {            
            nonStrokingColors.put(text, gs.getNonStrokingColor());
            super.processTextPosition(text);
        }
    }

    @Override
    protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
        for (TextPosition text : textPositions) {           
            Integer characterColor = getCharacterColor(text);
            Integer characterBackgroundColor = getCharacterBackgroundColor(text);
        }
    }

    private Vector getTextPositionCenterPoint(TextPosition text) {
        Matrix textMatrix = text.getTextMatrix();
        Vector start = textMatrix.transform(new Vector(0, 0));
        Vector center = null;
        switch (rotation) {
        case 0:
            center = new Vector(start.getX() + text.getWidth()/2, start.getY()); 
            break;
        case 90:
            center = new Vector(start.getX(), start.getY() + text.getWidth()/2);
            break;
        case 180:
            center = new Vector(start.getX() - text.getWidth()/2, start.getY());
            break;
        case 270:
            center = new Vector(start.getX(), start.getY() - text.getWidth()/2);
            break;
        default:
            center = new Vector(start.getX() + text.getWidth()/2, start.getY());
            break;
        }

        return center;
    }

    void addFillPath(PDColor color) {
        filledPaths.put((GeneralPath)linePath.clone(), color);
    }
}
java pdf pdfbox
1个回答
2
投票

这是PDFBox中的一个错误。

(嗯,您的代码中也存在问题,但手头问题的原因是基于PDFBox。)

The bug

问题是PDColor.toRGB()呼入

fillColorRgb = filledPath.getValue().toRGB();

针对特定颜色损坏颜色值本身!

所讨论的色彩空间是分色色彩空间。因此,PDColor.toRGB()使用其PDSeparation.toRGB(float[])成员作为参数调用components

如果给定参数的RGB值尚未在颜色空间中缓存,则PDSeparation.toRGB(float[])将评估给定参数的tintTransform。对于所讨论的颜色空间,色调变换是PDFunctionType0实例。因此,PDFunctionType0.eval(float[])被称为。

不幸的是PDFunctionType0.eval(float[])假设它可以使用数组参数input用于它自己的目的:

input[i] = clipToRange(input[i], domain.getMin(), domain.getMax());
input[i] = interpolate(input[i], domain.getMin(), domain.getMax(), 
        encodeValues.getMin(), encodeValues.getMax());
input[i] = clipToRange(input[i], 0, sizeValues[i] - 1);

但这个阵列是原来的PDColor成员components。因此,此评估将颜色对象的单个组件从0.172更改为43.688。

后来toRGB要求该颜色找到43.688(或由于进一步不需要的更改而导致的其他值)远远超出最大值1.0,因此它们将其剪切为1.0并从那里进行转换。但是,具有组件1.0的该颜色空间中的颜色恰好是用于前景文本的颜色。因此,您的代码认为背景和前景是相同的。

A work-around

要解决此问题,应重写方法PDFunctionType0.eval(float[]),不要写入其参数数组。一个快速的方法是添加

input = input.clone();

在那个方法的顶部PDFunctionType0.eval(float[])

An issue in your code

您的方法getTextPositionCenterPoint使用页面旋转来确定给定TextPosition的基线的中心。但是,对于直接绘制页面旋转后的文本,这是正确的。但是,对于您的文档,情况并非如此。因此,您需要分析文本矩阵以查找文本的实际方向,而不是页面旋转。

但是,这不会对您的情况产生太大影响,因为您用作字符宽度的TextPosition.getWidth()值也是根据页面旋转计算的。由于有问题的页面没有旋转但是文本方向旋转了90°,TextPosition.getWidth()总是返回0 ...你可能想要使用getWidthDirAdj()而不是......

© www.soinside.com 2019 - 2024. All rights reserved.