使用Aphace POI与Word文档进行双向通信

问题描述 投票:0回答:1

我正在尝试将一些希伯来语文本添加到Word文档中,并且可以正常工作,但是当我添加标点符号时,会变得凌乱。

这是我运行的代码:

public static void main(String[] args) throws Exception {

    XWPFDocument document = new XWPFDocument();
    XWPFParagraph paragraph = document.createParagraph();

    paragraph.setAlignment(ParagraphAlignment.LEFT);

    // make RTL direction
    CTP ctp = paragraph.getCTP();
    CTPPr ctppr;
    if ((ctppr = ctp.getPPr()) == null) {
        ctppr = ctp.addNewPPr();
    }
    ctppr.addNewBidi().setVal(STOnOff.ON);

    XWPFRun run = paragraph.createRun();
    run.setText("שלום עולם !");

    // create the document in the specific path by giving it a name
    File newFile = new File("helloWorld.docx");

    // insert document to newFile
    try {
        FileOutputStream output = new FileOutputStream(newFile);
        document.write(output);
        output.close();
        document.close();
    } catch (Exception e) {
        e.printStackTrace();
    }
}

这是我得到的“ helloWorld.docx”:

screenshot

这就是它的样子:

screenshot

此外,我希望整个文档都是RTL(即使是双向文档),而不仅仅是特定段落。

感谢您的帮助!

java ms-word apache-poi docx bidirectional
1个回答
1
投票

这是使用双向文本的众所周知的问题。感叹号以及空格本身不是从右到左的字符。因此,如果需要,我们需要对其进行标记。 RIGHT-TO-LEFT MARK (RLM)U+200F。参见https://en.wikipedia.org/wiki/Bidirectional_text#Table_of_possible_BiDi_character_types

以下代码对我有用:

import java.io.FileOutputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTPPr;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.STOnOff;

public class CreateWordRTLParagraph {

 public static void main(String[] args) throws Exception {

  XWPFDocument doc= new XWPFDocument();

  XWPFParagraph paragraph = doc.createParagraph();
  CTP ctp = paragraph.getCTP();
  CTPPr ctppr;
  if ((ctppr = ctp.getPPr()) == null) ctppr = ctp.addNewPPr();
  ctppr.addNewBidi().setVal(STOnOff.ON);

  XWPFRun run = paragraph.createRun();
  run.setText("שלום עולם \u200F!\u200F");

  FileOutputStream out = new FileOutputStream("WordDocument.docx");
  doc.write(out);
  out.close();
  doc.close();

 }
}

注意\u200F标记之后空格和感叹号。

如果文本行来自文件,则标记单个字符将不是最佳实践。然后,整个文本行应标记为从右到左的文本。为此,我们可以将文本行嵌入到U+202B RIGHT-TO-LEFT EMBEDDING (RLE)之后再插入U+202C POP DIRECTIONAL FORMATTING (PDF)

示例:

import java.io.File;
import java.io.FileOutputStream;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;

import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTPPr;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.STOnOff;

import java.util.List;

public class CreateWordRTLParagraphsFromFile {

 public static void main(String[] args) throws Exception {

  List<String> lines = Files.readAllLines(new File("HebrewTextFile.txt").toPath(), StandardCharsets.UTF_8);

  XWPFDocument doc= new XWPFDocument();

  for (String line : lines) {

   XWPFParagraph paragraph = doc.createParagraph();
   CTP ctp = paragraph.getCTP();
   CTPPr ctppr = ctp.getPPr();
   if (ctppr == null) ctppr = ctp.addNewPPr();
   ctppr.addNewBidi().setVal(STOnOff.ON);

   XWPFRun run = paragraph.createRun();
   run.setText("\u202E" + line + "\u202C");

  }

  FileOutputStream out = new FileOutputStream("WordDocument.docx");
  doc.write(out);
  out.close();
  doc.close();

 }
}
© www.soinside.com 2019 - 2024. All rights reserved.