我正在尝试使用 Apache PDFBox 使用以下代码对 PDF 执行文本替换。文本替换工作正常,但同时 pdf 的某些部分丢失了。您能帮我找出是什么原因造成的吗?调用replaceTextInSecond函数后,我只是执行下面的操作
document.save("filename.pdf");
document.close();
您能帮我找出原因吗?预先感谢!
private static PDDocument replaceTextInSecond(PDDocument document, String searchString, String replacement) {
PDPage page = document.getPage(1);
PDFStreamParser parser;
try {
parser = new PDFStreamParser(page);
parser.parse();
List<?> tokens = parser.getTokens();
for (int j = 0; j < tokens.size(); j++) {
Object next = tokens.get(j);
if (next instanceof Operator) {
Operator op = (Operator) next;
String pstring = "";
int prej = 0;
if (op.getName().equals("Tj")) {
COSString previous = (COSString) tokens.get(j - 1);
String string = previous.getString();
//System.out.println("string :::: " +string);
string = string.replaceFirst(searchString, replacement);
previous.setValue(string.getBytes());
} else if (op.getName().equals("TJ")) {
COSArray previous = (COSArray) tokens.get(j - 1);
for (int k = 0; k < previous.size(); k++) {
Object arrElement = previous.getObject(k);
if (arrElement instanceof COSString) {
COSString cosString = (COSString) arrElement;
String string = cosString.getString();
//System.out.println("string :::: " +string);
if (j == prej) {
pstring += string;
} else {
prej = j;
pstring = string;
}
}
}
if (searchString.equals(pstring.trim())) {
COSString cosString2 = (COSString) previous.getObject(0);
cosString2.setValue(replacement.getBytes());
int total = previous.size() - 1;
for (int k = total; k > 0; k--) {
previous.remove(k);
}
}
}
}
}
PDStream updatedStream = new PDStream(document);
OutputStream out = updatedStream.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
tokenWriter.writeTokens(tokens);
out.close();
page.setContents(updatedStream);
//System.out.println("replaced " +searchString + " with " + replacement);
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return document;
}
我发现了问题,
previous.setValue(string.getBytes());
就是罪魁祸首。保持代码为
if(string.contains(searchString)){
string = string.replaceFirst(searchString, replacement);
previous.setValue(string.getBytes());
}
解决了。看起来这与 pdf 的编码方式或其他相关。
无论如何,感谢您的帮助! :)