我刚刚发现Apache POI库对于使用Java编辑Word文件非常有用。具体来说,我想使用Apache POI的XWPF类编辑DOCX文件。我发现没有适当的方法/文档,我可以这样做。有人可以分步说明,如何替换DOCX文件中的一些文本。
**文本可以在行/段落或表格行/列中
提前致谢 :)
你需要的方法是XWPFRun.setText(String)。只需按照您的方式浏览文件,直到找到感兴趣的XWPFRun,找出您想要的新文本并替换它。 (运行是具有相同格式的文本序列)
你应该可以这样做:
XWPFDocument doc = new XWPFDocument(OPCPackage.open("input.docx"));
for (XWPFParagraph p : doc.getParagraphs()) {
List<XWPFRun> runs = p.getRuns();
if (runs != null) {
for (XWPFRun r : runs) {
String text = r.getText(0);
if (text != null && text.contains("needle")) {
text = text.replace("needle", "haystack");
r.setText(text, 0);
}
}
}
}
for (XWPFTable tbl : doc.getTables()) {
for (XWPFTableRow row : tbl.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph p : cell.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (text != null && text.contains("needle")) {
text = text.replace("needle", "haystack");
r.setText(text,0);
}
}
}
}
}
}
doc.write(new FileOutputStream("output.docx"));
以下是我们使用Apache POI替换文本所做的工作。我们发现替换整个XWPFParagraph的文本而不是运行是不值得的麻烦和简单。由于Microsoft Word负责在文档段落中创建运行的位置,因此可以在单词的中间随机分割运行。因此,您可能正在搜索的文本可能是一次运行的一半,另一半运行的一半。使用段落的全文,删除其现有的运行,并添加带有调整后的文本的新运行似乎解决了文本替换的问题。
但是,在段落级别进行替换需要付出代价;你丢失了该段落中的运行格式。例如,如果在段落的中间你加粗了“bits”这个单词,那么在解析文件时你用“bytes”替换了“bits”这个单词,“bytes”这个单词将不再是粗体。因为粗体存储了一个在段落的整个文本体被替换时被删除的运行。附加的代码有一个注释掉的部分,如果需要,它可以在运行级别替换文本。
还应注意,如果要插入的文本包含\ n返回字符,则以下内容有效。我们找不到一种方法来插入返回而不为返回之前的每个部分创建一个运行并标记运行addCarriageReturn()。干杯
package com.healthpartners.hcss.client.external.word.replacement;
import java.util.List;
import org.apache.commons.lang.StringUtils;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
public class TextReplacer {
private String searchValue;
private String replacement;
public TextReplacer(String searchValue, String replacement) {
this.searchValue = searchValue;
this.replacement = replacement;
}
public void replace(XWPFDocument document) {
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph xwpfParagraph : paragraphs) {
replace(xwpfParagraph);
}
}
private void replace(XWPFParagraph paragraph) {
if (hasReplaceableItem(paragraph.getText())) {
String replacedText = StringUtils.replace(paragraph.getText(), searchValue, replacement);
removeAllRuns(paragraph);
insertReplacementRuns(paragraph, replacedText);
}
}
private void insertReplacementRuns(XWPFParagraph paragraph, String replacedText) {
String[] replacementTextSplitOnCarriageReturn = StringUtils.split(replacedText, "\n");
for (int j = 0; j < replacementTextSplitOnCarriageReturn.length; j++) {
String part = replacementTextSplitOnCarriageReturn[j];
XWPFRun newRun = paragraph.insertNewRun(j);
newRun.setText(part);
if (j+1 < replacementTextSplitOnCarriageReturn.length) {
newRun.addCarriageReturn();
}
}
}
private void removeAllRuns(XWPFParagraph paragraph) {
int size = paragraph.getRuns().size();
for (int i = 0; i < size; i++) {
paragraph.removeRun(0);
}
}
private boolean hasReplaceableItem(String runText) {
return StringUtils.contains(runText, searchValue);
}
//REVISIT The below can be removed if Michele tests and approved the above less versatile replacement version
// private void replace(XWPFParagraph paragraph) {
// for (int i = 0; i < paragraph.getRuns().size() ; i++) {
// i = replace(paragraph, i);
// }
// }
// private int replace(XWPFParagraph paragraph, int i) {
// XWPFRun run = paragraph.getRuns().get(i);
//
// String runText = run.getText(0);
//
// if (hasReplaceableItem(runText)) {
// return replace(paragraph, i, run);
// }
//
// return i;
// }
// private int replace(XWPFParagraph paragraph, int i, XWPFRun run) {
// String runText = run.getCTR().getTArray(0).getStringValue();
//
// String beforeSuperLong = StringUtils.substring(runText, 0, runText.indexOf(searchValue));
//
// String[] replacementTextSplitOnCarriageReturn = StringUtils.split(replacement, "\n");
//
// String afterSuperLong = StringUtils.substring(runText, runText.indexOf(searchValue) + searchValue.length());
//
// Counter counter = new Counter(i);
//
// insertNewRun(paragraph, run, counter, beforeSuperLong);
//
// for (int j = 0; j < replacementTextSplitOnCarriageReturn.length; j++) {
// String part = replacementTextSplitOnCarriageReturn[j];
//
// XWPFRun newRun = insertNewRun(paragraph, run, counter, part);
//
// if (j+1 < replacementTextSplitOnCarriageReturn.length) {
// newRun.addCarriageReturn();
// }
// }
//
// insertNewRun(paragraph, run, counter, afterSuperLong);
//
// paragraph.removeRun(counter.getCount());
//
// return counter.getCount();
// }
// private class Counter {
// private int i;
//
// public Counter(int i) {
// this.i = i;
// }
//
// public void increment() {
// i++;
// }
//
// public int getCount() {
// return i;
// }
// }
// private XWPFRun insertNewRun(XWPFParagraph xwpfParagraph, XWPFRun run, Counter counter, String newText) {
// XWPFRun newRun = xwpfParagraph.insertNewRun(counter.i);
// newRun.getCTR().set(run.getCTR());
// newRun.getCTR().getTArray(0).setStringValue(newText);
//
// counter.increment();
//
// return newRun;
// }
我的任务是用单词docx文档中的地图值替换格式$ {key}的文本。上述解决方案是一个很好的起点,但未考虑所有情况:$ {key}不仅可以在多个运行中传播,还可以在运行中的多个文本中传播。因此,我最终得到以下代码:
private void replace(String inFile, Map<String, String> data, OutputStream out) throws Exception, IOException {
XWPFDocument doc = new XWPFDocument(OPCPackage.open(inFile));
for (XWPFParagraph p : doc.getParagraphs()) {
replace2(p, data);
}
for (XWPFTable tbl : doc.getTables()) {
for (XWPFTableRow row : tbl.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph p : cell.getParagraphs()) {
replace2(p, data);
}
}
}
}
doc.write(out);
}
private void replace2(XWPFParagraph p, Map<String, String> data) {
String pText = p.getText(); // complete paragraph as string
if (pText.contains("${")) { // if paragraph does not include our pattern, ignore
TreeMap<Integer, XWPFRun> posRuns = getPosToRuns(p);
Pattern pat = Pattern.compile("\\$\\{(.+?)\\}");
Matcher m = pat.matcher(pText);
while (m.find()) { // for all patterns in the paragraph
String g = m.group(1); // extract key start and end pos
int s = m.start(1);
int e = m.end(1);
String key = g;
String x = data.get(key);
if (x == null)
x = "";
SortedMap<Integer, XWPFRun> range = posRuns.subMap(s - 2, true, e + 1, true); // get runs which contain the pattern
boolean found1 = false; // found $
boolean found2 = false; // found {
boolean found3 = false; // found }
XWPFRun prevRun = null; // previous run handled in the loop
XWPFRun found2Run = null; // run in which { was found
int found2Pos = -1; // pos of { within above run
for (XWPFRun r : range.values())
{
if (r == prevRun)
continue; // this run has already been handled
if (found3)
break; // done working on current key pattern
prevRun = r;
for (int k = 0;; k++) { // iterate over texts of run r
if (found3)
break;
String txt = null;
try {
txt = r.getText(k); // note: should return null, but throws exception if the text does not exist
} catch (Exception ex) {
}
if (txt == null)
break; // no more texts in the run, exit loop
if (txt.contains("$") && !found1) { // found $, replace it with value from data map
txt = txt.replaceFirst("\\$", x);
found1 = true;
}
if (txt.contains("{") && !found2 && found1) {
found2Run = r; // found { replace it with empty string and remember location
found2Pos = txt.indexOf('{');
txt = txt.replaceFirst("\\{", "");
found2 = true;
}
if (found1 && found2 && !found3) { // find } and set all chars between { and } to blank
if (txt.contains("}"))
{
if (r == found2Run)
{ // complete pattern was within a single run
txt = txt.substring(0, found2Pos)+txt.substring(txt.indexOf('}'));
}
else // pattern spread across multiple runs
txt = txt.substring(txt.indexOf('}'));
}
else if (r == found2Run) // same run as { but no }, remove all text starting at {
txt = txt.substring(0, found2Pos);
else
txt = ""; // run between { and }, set text to blank
}
if (txt.contains("}") && !found3) {
txt = txt.replaceFirst("\\}", "");
found3 = true;
}
r.setText(txt, k);
}
}
}
System.out.println(p.getText());
}
}
private TreeMap<Integer, XWPFRun> getPosToRuns(XWPFParagraph paragraph) {
int pos = 0;
TreeMap<Integer, XWPFRun> map = new TreeMap<Integer, XWPFRun>();
for (XWPFRun run : paragraph.getRuns()) {
String runText = run.text();
if (runText != null && runText.length() > 0) {
for (int i = 0; i < runText.length(); i++) {
map.put(pos + i, run);
}
pos += runText.length();
}
}
return map;
}
如果有人还需要保持文本的格式,这段代码效果更好。
private static Map<Integer, XWPFRun> getPosToRuns(XWPFParagraph paragraph) {
int pos = 0;
Map<Integer, XWPFRun> map = new HashMap<Integer, XWPFRun>(10);
for (XWPFRun run : paragraph.getRuns()) {
String runText = run.text();
if (runText != null) {
for (int i = 0; i < runText.length(); i++) {
map.put(pos + i, run);
}
pos += runText.length();
}
}
return (map);
}
public static <V> void replace(XWPFDocument document, Map<String, V> map) {
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph paragraph : paragraphs) {
replace(paragraph, map);
}
}
public static <V> void replace(XWPFDocument document, String searchText, V replacement) {
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph paragraph : paragraphs) {
replace(paragraph, searchText, replacement);
}
}
private static <V> void replace(XWPFParagraph paragraph, Map<String, V> map) {
for (Map.Entry<String, V> entry : map.entrySet()) {
replace(paragraph, entry.getKey(), entry.getValue());
}
}
public static <V> void replace(XWPFParagraph paragraph, String searchText, V replacement) {
boolean found = true;
while (found) {
found = false;
int pos = paragraph.getText().indexOf(searchText);
if (pos >= 0) {
found = true;
Map<Integer, XWPFRun> posToRuns = getPosToRuns(paragraph);
XWPFRun run = posToRuns.get(pos);
XWPFRun lastRun = posToRuns.get(pos + searchText.length() - 1);
int runNum = paragraph.getRuns().indexOf(run);
int lastRunNum = paragraph.getRuns().indexOf(lastRun);
String texts[] = replacement.toString().split("\n");
run.setText(texts[0], 0);
XWPFRun newRun = run;
for (int i = 1; i < texts.length; i++) {
newRun.addCarriageReturn();
newRun = paragraph.insertNewRun(runNum + i);
/*
We should copy all style attributes
to the newRun from run
also from background color, ...
Here we duplicate only the simple attributes...
*/
newRun.setText(texts[i]);
newRun.setBold(run.isBold());
newRun.setCapitalized(run.isCapitalized());
// newRun.setCharacterSpacing(run.getCharacterSpacing());
newRun.setColor(run.getColor());
newRun.setDoubleStrikethrough(run.isDoubleStrikeThrough());
newRun.setEmbossed(run.isEmbossed());
newRun.setFontFamily(run.getFontFamily());
newRun.setFontSize(run.getFontSize());
newRun.setImprinted(run.isImprinted());
newRun.setItalic(run.isItalic());
newRun.setKerning(run.getKerning());
newRun.setShadow(run.isShadowed());
newRun.setSmallCaps(run.isSmallCaps());
newRun.setStrikeThrough(run.isStrikeThrough());
newRun.setSubscript(run.getSubscript());
newRun.setUnderline(run.getUnderline());
}
for (int i = lastRunNum + texts.length - 1; i > runNum + texts.length - 1; i--) {
paragraph.removeRun(i);
}
}
}
}
第一块代码给我一个NullPointerException,有谁知道什么是错的?
run.getText(int position) - 来自文档:返回:此文本的文本运行,如果未设置则返回null
在调用contains()之前,只需检查它是否为null
顺便说一句,如果你想要替换你需要的文本,你需要将它设置在你获得它的位置,在这种情况下r.setText(text,0);.否则将添加文本而不替换
这里接受的答案需要一个更新以及Justin Skiles更新。 r.setText(text,0);原因:如果没有使用pos变量更新setText,则输出将是旧字符串和替换字符串的组合。
截至撰写之日,没有一个答案正确替换。
Gagravars答案不包括要替换的单词在运行中分开的情况; Thierry Boduins解决方案有时会在替换其他单词后留下替换空白的单词,也不会检查表格。
使用Gagtavars答案作为基础我还检查了当前运行之前的运行,如果两个运行的文本都包含要替换的单词,则添加else块。我在kotlin的补充:
if (text != null) {
if (text.contains(findText)) {
text = text.replace(findText, replaceText)
r.setText(text, 0)
} else if (i > 0 && p.runs[i - 1].getText(0).plus(text).contains(findText)) {
val pos = p.runs[i - 1].getText(0).indexOf('$')
text = textOfNotFullSecondRun(text, findText)
r.setText(text, 0)
val findTextLengthInFirstRun = findTextPartInFirstRun(p.runs[i - 1].getText(0), findText)
val prevRunText = p.runs[i - 1].getText(0).replaceRange(pos, findTextLengthInFirstRun, replaceText)
p.runs[i - 1].setText(prevRunText, 0)
}
}
private fun textOfNotFullSecondRun(text: String, findText: String): String {
return if (!text.contains(findText)) {
textOfNotFullSecondRun(text, findText.drop(1))
} else {
text.replace(findText, "")
}
}
private fun findTextPartInFirstRun(text: String, findText: String): Int {
return if (text.contains(findText)) {
findText.length
} else {
findTextPartInFirstRun(text, findText.dropLast(1))
}
}
它是段落中的运行列表。与表中的搜索块相同。有了这个解决方案,我还没有任何问题。所有格式都完好无损。
编辑:我做了一个java lib替换,检查出来:https://github.com/deividasstr/docx-word-replacer
有replaceParagraph
实现用${key}
(value
参数)替换fieldsForReport
并通过合并runs
内容${key}
来保存格式。
private void replaceParagraph(XWPFParagraph paragraph, Map<String, String> fieldsForReport) throws POIXMLException {
String find, text, runsText;
List<XWPFRun> runs;
XWPFRun run, nextRun;
for (String key : fieldsForReport.keySet()) {
text = paragraph.getText();
if (!text.contains("${"))
return;
find = "${" + key + "}";
if (!text.contains(find))
continue;
runs = paragraph.getRuns();
for (int i = 0; i < runs.size(); i++) {
run = runs.get(i);
runsText = run.getText(0);
if (runsText.contains("${") || (runsText.contains("$") && runs.get(i + 1).getText(0).substring(0, 1).equals("{"))) {
while (!runsText.contains("}")) {
nextRun = runs.get(i + 1);
runsText = runsText + nextRun.getText(0);
paragraph.removeRun(i + 1);
}
run.setText(runsText.contains(find) ?
runsText.replace(find, fieldsForReport.get(key)) :
runsText, 0);
}
}
}
}
我建议我在#之间替换文本的解决方案,例如:这个#bookmark#应该被替换。它被替换为:
此外,它考虑了符号#和书签在分开的运行中的情况(在不同的运行之间替换变量)。
链接到代码:https://gist.github.com/aerobium/bf02e443c079c5caec7568e167849dda