我正在开发一个遗留产品,其中需要实现一些功能。我正在尝试使用 apache-poi 5.2.2 根据搜索条件向现有 Word 文档添加注释。基本上,如果 docx 文档中的单词与操作中定义的原始文本匹配,则需要添加注释。
我已经能够向文档添加评论。
但是,我无法在评论中添加评论范围的开始和结束(在需要评论的文本处)。我假设它也需要某种形式的注释。例如,当我使用带有预先存在的注释的文档时,我注意到该位置的文本如下所示:
<w:commentRangeStart w:id="0"/><w:r><w:rPr><w:b/><w:sz w:val="27"/></w:rPr><w:t>Júlio</w:t></w:r><w:r><w:rPr><w:b/><w:spacing w:val="-3"/><w:sz w:val="27"/></w:rPr><w:t xml:space="preserve"> </w:t></w:r><w:r><w:rPr><w:b/><w:sz w:val="27"/></w:rPr><w:t>César</w:t></w:r><w:commentRangeEnd w:id="0"/><w:r w:rsidR="00B43339"><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r><w:r>
据我所知,评论的 XML 位于单独的 CommentsDocument 中,如下所示:
//<xml-fragment w:id="0" w:author="<NAME OF COMMENT CREATOR>" w:date="2024-03-13T10:11:00Z" w:initials="<HERE COME THE INITIALS>" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:oel="http://schemas.microsoft.com/office/2019/extlst" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape">
// <w:p w14:paraId="4A10B938" w14:textId="77777777" w:rsidR="00B43339" w:rsidRDefault="00B43339" w:rsidP="00B43339">
// <w:r>
// <w:rPr>
// <w:rStyle w:val="CommentReference"/>
// </w:rPr>
// <w:annotationRef/>
// </w:r>
// <w:r>
// <w:rPr>
// <w:color w:val="000000"/>
// <w:sz w:val="20"/>
// <w:szCs w:val="20"/>
// </w:rPr>
// <w:t>This is a pre-annotation existing comment</w:t>
// </w:r>
// </w:p>
//</xml-fragment>
考虑到其中一些帖子(向特定单词添加注释或使用 Apache POI 在 docx 文档中运行)我尝试了一些方法:
我猜事情需要两者结合。现在我想专注于添加对特定单词的评论。
if(paragraph.getText().contains(action.getOriginalText())){
//SINCE NOT THE ENTIRE PARAGRAPH NEEDS TO BE ANNOTATED, WE NEED TO LOOK AT THE RUNS INSIDE THE PARAGRAPH
for(int runIndex = internalParagraphRunIndex; runIndex < paragraph.getRuns().size(); runIndex++) {
XWPFRun run = paragraph.getRuns().get(runIndex);
if (run.text().equals(action.getOriginalText())) {
//THE ENTIRE RUN NEEDS TO BE ANNOTATED
throw new RuntimeException("Not yet implemented");
} else if (run.text().contains(action.getOriginalText())) {
System.out.println("Part of the run needs to be annotated");
//THE TEXT THAT NEEDS TO BE ANNOTATED IS PART OF THE RUN
//Getting the comments from the document
XWPFComments comments = wordDocument.getDocComments();
CTComments existingCtComments = comments.getCtComments();
//Creating CTComment
CTComment newCTComment = existingCtComments.addNewComment();
newCTComment.setId(getCommentId(existingCtComments));
String[] splittedText = splitRunTextIntoParts(run, action.getOriginalText());
int indexOfTextThatNeedsToBeAnnotatedInSplittedText = findLocationOfTextInSplittedText(splittedText, action.getOriginalText());
for (int z = 0; z < splittedText.length; z++) {
if (indexOfTextThatNeedsToBeAnnotatedInSplittedText == -1) {
throw new RuntimeException("The text that needs to be annotated is not found in the splitted run.");
} else {
if (z == indexOfTextThatNeedsToBeAnnotatedInSplittedText) {
//the exact word that needs to be annotated
XWPFRun runToInsert = paragraph.insertNewRun(runIndex + z);
//insert part of the text of the run
runToInsert.setText(splittedText[z]);
paragraph.getCTP().addNewCommentRangeEnd().setId(newCTComment.getId());
//add the comment reference AFTER the text
runToInsert.getCTR().addNewCommentReference().setId(newCTComment.getId());
//TODO: remove styling
runToInsert.setBold(true);
} else if (z == indexOfTextThatNeedsToBeAnnotatedInSplittedText - 1) {
XWPFRun runToInsert = paragraph.insertNewRun(runIndex + z);
//insert part of the text of the run
runToInsert.setText(splittedText[z]);
//add the range start after the text of the run
CTMarkupRange rangeStartMarkupRange = paragraph.getCTP().addNewCommentRangeStart();
rangeStartMarkupRange.setId(newCTComment.getId());
newCTComment.setCommentRangeStartArray(new CTMarkupRange[]{rangeStartMarkupRange});
//TODO: remove styling
runToInsert.setItalic(true);
} else if (z == indexOfTextThatNeedsToBeAnnotatedInSplittedText + 1) {
//add the range end before the text of the run
CTMarkupRange rangeEndeMarkupRange = paragraph.getCTP().addNewCommentRangeEnd();
rangeEndeMarkupRange.setId(newCTComment.getId());
newCTComment.setCommentRangeEndArray(new CTMarkupRange[]{rangeEndeMarkupRange});
//newCTComment.setCommentRangeEndArray(new CTMarkupRange[]{rangeEndeMarkupRange});
//insert new run
XWPFRun runToInsert = paragraph.insertNewRun(runIndex + z);
//insert part of the text of the run
runToInsert.setText(splittedText[z]);
//TODO: remove styling
runToInsert.setUnderline(UnderlinePatterns.SINGLE);
}
}
}
//remove original run
paragraph.removeRun(runIndex + splittedText.length); //the new runs are put in front of the old run
//Creating the new XWPFComment based on the CTComment
XWPFComment newComment = new XWPFComment(newCTComment, comments);
newComment.setAuthor("Teradactor");
newComment.setDate(new GregorianCalendar());
newComment.setInitials("TD");
newComment.createParagraph().createRun().setText(action.getAnnotationText());
comments.createComment(BigInteger.valueOf(Long.parseLong(newComment.getId())));
setParagraphIndex(paragraphToLookAt);
setInternalParagraphRunIndex(runIndex + 3); // the next time we want to start from the runs after th, since this one is already annotated
setActionFound(true);
//because we are splitting the run into several runs, we need to decrease the index of the tnsmap text
break;
}
该方法成功在需要注释的单词后面添加注释引用。我还可以看到需要注释的单词之前或之后的文本已设置样式(斜体或下划线),并且单词本身是粗体的。然而,这个词本身并没有适当的注释参考。如果需要注释参考,最好了解以及如何设置它。
所以你的主要问题是:
如何注释现有 Word 文档中的单个文本部分(无论是已经运行的单个文本还是在长文本运行中)?
这个范围非常广泛。太宽泛了,无法在这里回答。 要获取单个文本部分作为自己的文本运行,请参阅如何使用 Apache POI 突出显示替换的单词。请阅读 值“name”和“surname”不读取 apache poi 和 Apache POI:${my_placeholder} 也被视为三个不同的运行,因为这提供了
XWPFParagraph.searchText
的错误修复。
除了如何获取单个文本部分作为自己的文本运行的问题之外,如何评论单个文本运行的问题可以这样回答:
每个注释文本运行看起来都像这样:
word/document.xml
:
...
<w:commentRangeStart w:id="1"/>
<w:r>
<w:rPr>
...
</w:rPr>
<w:t>run text</w:t>
</w:r>
<w:commentRangeEnd w:id="1"/>
<w:r>
<w:commentReference w:id="1"/>
</w:r>
..
在文本运行之前有一个
commentRangeStart
,在文本运行之后有一个 commentRangeEnd
,紧接着是仅包含 commentReference
的文本运行。
要创建此内容,我们需要使用
org.openxmlformats.schemas.wordprocessingml.x2006.main.*
类和本机 XML 方法(如 org.apache.xmlbeans.XmlCursor
),因为 Apache POI 不提供在 XWPF
中执行此操作的方法。
完整的示例,只需注释每个单独的文本运行即可表明其有效。
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.apache.xmlbeans.XmlCursor;
import java.math.BigInteger;
import java.util.GregorianCalendar;
import java.util.Locale;
public class WordCommentTextRuns {
//method to get or create the CommentsDocument /word/comments.xml in the *.docx ZIP archive
private static XWPFComments createCommentsDocument(XWPFDocument document) throws Exception {
XWPFComments commentsDocument = null;
//trying to get the CommentsDocument
commentsDocument = document.getDocComments();
//create a new CommentsDocument if there is not one already
if (commentsDocument == null) {
commentsDocument = document.createComments();
System.out.println("comments document created");
}
return commentsDocument;
}
//method to get the next comment Id from CTComments
private static BigInteger getCommentId(CTComments comments) {
BigInteger cId = BigInteger.ZERO;
for (CTComment ctComment : comments.getCommentList()) {
if (ctComment.getId().compareTo(cId) == 1) {
cId = ctComment.getId();
}
}
cId = cId.add(BigInteger.ONE);
return cId;
}
//method to set CommentRangeStart before text run
private static CTMarkupRange insertCommentRangeStartBefore(XWPFRun run) {
String uri = CTMarkupRange.type.getName().getNamespaceURI();
String localPart = "commentRangeStart";
XmlCursor cursor = run.getCTR().newCursor();
cursor.beginElement(localPart, uri);
cursor.toParent();
CTMarkupRange commentRangeStart = (CTMarkupRange)cursor.getObject();
return commentRangeStart;
}
//method to set CommentRangeEnd after text run
private static CTMarkupRange insertCommentRangeEndAfter(XWPFRun run) {
String uri = CTMarkupRange.type.getName().getNamespaceURI();
String localPart = "commentRangeEnd";
XmlCursor cursor = run.getCTR().newCursor();
cursor.toEndToken();
cursor.toNextToken();
cursor.beginElement(localPart, uri);
cursor.toParent();
CTMarkupRange commentRangeStart = (CTMarkupRange)cursor.getObject();
return commentRangeStart;
}
//method to set CommentReference after CommentRangeEnd
private static void insertCommentReferenceAfter(CTMarkupRange commentRangeEnd, BigInteger cId) {
String uri = CTR.type.getName().getNamespaceURI();
String localPart = "r";
XmlCursor cursor = commentRangeEnd.newCursor();
cursor.toEndToken();
cursor.toNextToken();
cursor.beginElement(localPart, uri);
cursor.toParent();
CTR ctr = (CTR)cursor.getObject();
ctr.addNewCommentReference().setId(cId);
}
//method to comment single text runs
private static void commentTextRun(XWPFRun run, CTComments comments, String commentText) {
CTComment ctComment;
//comment for the run
BigInteger cId = getCommentId(comments);
ctComment = comments.addNewComment();
ctComment.setAuthor("Axel Ríchter");
ctComment.setInitials("AR");
ctComment.setDate(new GregorianCalendar(Locale.US));
ctComment.addNewP().addNewR().addNewT().setStringValue(commentText);
ctComment.setId(cId);
//set CommentRangeStart
CTMarkupRange commentRangeStart = insertCommentRangeStartBefore(run);
commentRangeStart.setId(cId);
//set CommentRangeEnd and CommentReference
CTMarkupRange commentRangeEnd = insertCommentRangeEndAfter(run);
commentRangeEnd.setId(cId);
insertCommentReferenceAfter(commentRangeEnd, cId);
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("./WordDocument.docx"));
XWPFComments commentsDocument = createCommentsDocument(document);
CTComments comments = commentsDocument.getCtComments();
for (XWPFParagraph paragraph : document.getParagraphs()) {
for (XWPFRun run : paragraph.getRuns()) {
// simply comment each single text run to show that it works
commentTextRun(run, comments, "Comment text");
}
}
FileOutputStream out = new FileOutputStream("./WordDocumentWithComments.docx");
document.write(out);
out.close();
document.close();
}
}