Open XML-在文档模板中查找和替换多个占位符

Question

我知道有很多关于此主题的帖子，但是似乎没有一个帖子可以处理这个特定问题。我正在尝试制作一个小型的通用文档生成器POC。我正在使用Open XML。

代码如下：

   private static void ReplacePlaceholders<T>(string templateDocumentPath, T templateObject)
        where T : class
    {

        using (var templateDocument = WordprocessingDocument.Open(templateDocumentPath, true))
        {
            string templateDocumentText = null;
            using (var streamReader = new StreamReader(templateDocument.MainDocumentPart.GetStream()))
            {
                templateDocumentText = streamReader.ReadToEnd();
            }

            var props = templateObject.GetType().GetProperties();
            foreach (var prop in props)
            {
                var regexText = new Regex($"{prop.Name}");
                templateDocumentText =
                    regexText.Replace(templateDocumentText, prop.GetValue(templateObject).ToString());
            }

            using var streamWriter = new StreamWriter(templateDocument.MainDocumentPart.GetStream(FileMode.Create));
                streamWriter.Write(templateDocumentText);
        }
    }

该代码按预期工作。问题如下：

StreamReader.ReadToEnd（）在标签之间分割我的[[placeholders，所以我的Replace方法替换了only不会被分割的单词。

在这种情况下，我的代码将搜索单词“ Firstname”，但将查找“ irstname”，因此它不会替代它。
有没有办法逐字扫描整个.docx并替换它们？
（编辑）
部分解决方案/解决方法
我发现：-我注意到您必须立即在.docx中写入占位符（无需重新编辑）。例如，如果我写“ firstname”，然后再将其修改为“ Firstname”，它将把单词拆分为“ F”“ irstname”。如果不进行编辑，它将被拆分。

Answer 1

TLDR

简而言之，解决问题的方法是使用`OpenXmlRegex`的Open-Xml-PowerTools实用程序类，如下面进一步的单元测试所示。

为什么？

使用Open XML，您可以用多种方式表示相同的文本。如果Microsoft Word参与了该Open XML标记的创建，则为产生该文本而进行的编辑将发挥重要作用。这是因为Word会跟踪在哪个编辑会话中进行了哪些编辑。因此，例如，在以下极端情况下显示的`w:p`（`Paragraph`）元素表示的文本完全相同。这两个示例之间的任何事情都是可能的，因此任何真正的解决方案都必须能够解决这一问题。

极端方案1：单个w:r和w:t元素

以下标记很容易：

<w:p> <w:r> <w:t>Firstname</w:t> </w:r> </w:p>

极端方案2：单字符w:r和w:t元素虽然通常不会找到以下标记，但它表示每个字符都有其自己的w:r和w:t元素的理论极限。
<w:p>
  <w:r>
    <w:t>F</w:t>
    <w:t>i</w:t>
    <w:t>r</w:t>
    <w:t>s</w:t>
    <w:t>t</w:t>
    <w:t>n</w:t>
    <w:t>a</w:t>
    <w:t>m</w:t>
    <w:t>e</w:t>
  </w:r>
</w:p>

您可能会问，为什么在实践中没有出现这个极端示例？答案是，如果您想自己动手，它在解决方案中起着至关重要的作用。如何滚动您自己的？
要正确执行，您必须：
将您的段落（w:r）的运行（w:p）转换为单字符运行（即w:r元素具有一个单字符w:t或每个w:sym），并保留运行属性（w:rPr）;对那些单字符运行执行搜索和替换操作（使用其他技巧）；和
考虑到由搜索和替换操作导致的运行的潜在不同运行属性（w:rPr，将这样得到的运行转换回代表文本及其格式所需的最少的“合并”运行）。 >
替换文本时，您不应丢失或更改不受替换影响的文本格式。您也不应删除不受影响的字段或内容控件（w:sdt）。嗯，顺便说一句，不要忘记修订标记，例如w:ins和w:del ...
为什么不滚动您自己的？
好消息是您不必自己动手。埃里克·怀特OpenXmlRegex的Open-Xml-PowerTools实用工具类实现了上述算法（以及更多）。我已经在大型RFP和签约场景中成功使用了它，并且也对此做出了贡献。
如何使用OPEN-XML-POWERTOOLS？
[在本节中，我将演示如何使用Open-Xml-PowerTools将占位符文本“ Firstname”（如在问题中）替换为各种名字（在示例输出文档中使用“ Bernie”） 。
样本输入文件
让我们先看下面的示例文档，它是由稍后显示的单元测试创建的。请注意，我们已经格式化了运行和符号。就像在问题中一样，占位符“名字”被分成两个运行，即“ F”和“ irstname”。
<?xml version="1.0" encoding="utf-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p>
      <w:r>
        <w:rPr>
          <w:i />
        </w:rPr>
        <w:t xml:space="preserve">Hello </w:t>
      </w:r>
      <w:r>
        <w:rPr>
          <w:b />
        </w:rPr>
        <w:t>F</w:t>
      </w:r>
      <w:r>
        <w:rPr>
          <w:b />
        </w:rPr>
        <w:t>irstname</w:t>
      </w:r>
      <w:r>
        <w:t xml:space="preserve"> </w:t>
      </w:r>
      <w:r>
        <w:sym w:font="Wingdings" w:char="F04A" />
      </w:r>
    </w:p>
  </w:body>
</w:document>
所需的输出文件
以下是正确执行后将“ Firstname”替换为“ Bernie”的文档。请注意，格式保留，并且我们没有丢失符号。
<?xml version="1.0" encoding="utf-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p>
      <w:r>
        <w:rPr>
          <w:i />
        </w:rPr>
        <w:t xml:space="preserve">Hello </w:t>
      </w:r>
      <w:r>
        <w:rPr>
          <w:b />
        </w:rPr>
        <w:t>Bernie</w:t>
      </w:r>
      <w:r>
        <w:t xml:space="preserve"> </w:t>
      </w:r>
      <w:r>
        <w:sym w:font="Wingdings" w:char="F04A" />
      </w:r>
    </w:p>
  </w:body>
</w:document>
样本用法
接下来，这是一个完整的单元测试，演示了如何使用OpenXmlRegex.Replace()方法，并注意该示例仅显示了多个重载之一。单元测试还证明了它的工作原理：

同时保留占位符的格式；
而不会丢失其他运行的格式；和
不丢失符号（或其他任何标记，例如字段或内容控件）。
[Theory] [InlineData("1 Run", "Firstname", new[] { "Firstname" }, "Albert")] [InlineData("2 Runs", "Firstname", new[] { "F", "irstname" }, "Bernie")] [InlineData("9 Runs", "Firstname", new[] { "F", "i", "r", "s", "t", "n", "a", "m", "e" }, "Charly")] public void Replace_PlaceholderInOneOrMoreRuns_SuccessfullyReplaced( string example, string propName, IEnumerable<string> runTexts, string replacement) { // Create a test WordprocessingDocument on a MemoryStream. using MemoryStream stream = CreateWordprocessingDocument(runTexts); // Save the Word document before replacing the placeholder. // You can use this to inspect the input Word document. File.WriteAllBytes($"{example} before Replacing.docx", stream.ToArray()); // Replace the placeholder identified by propName with the replacement text. using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, true)) { // Read the root element, a w:document in this case. // Note that GetXElement() is a shortcut for GetXDocument().Root. // This caches the root element and we can later write it back // to the main document part, using the PutXDocument() method. XElement document = wordDocument.MainDocumentPart.GetXElement(); // Specify the parameters of the OpenXmlRegex.Replace() method, // noting that the replacement is given as a parameter. IEnumerable<XElement> content = document.Descendants(W.p); var regex = new Regex(propName); // Perform the replacement, thereby modifying the root element. OpenXmlRegex.Replace(content, regex, replacement, null); // Write the changed root element back to the main document part. wordDocument.MainDocumentPart.PutXDocument(); } // Assert that we have done it right. AssertReplacementWasSuccessful(stream, replacement); // Save the Word document after having replaced the placeholder. // You can use this to inspect the output Word document. File.WriteAllBytes($"{example} after Replacing.docx", stream.ToArray()); } private static MemoryStream CreateWordprocessingDocument(IEnumerable<string> runTexts) { var stream = new MemoryStream(); const WordprocessingDocumentType type = WordprocessingDocumentType.Document; using (WordprocessingDocument wordDocument = WordprocessingDocument.Create(stream, type)) { MainDocumentPart mainDocumentPart = wordDocument.AddMainDocumentPart(); mainDocumentPart.PutXDocument(new XDocument(CreateDocument(runTexts))); } return stream; } private static XElement CreateDocument(IEnumerable<string> runTexts) { // Produce a w:document with a single w:p that contains: // (1) one italic run with some lead-in, i.e., "Hello " in this example; // (2) one or more bold runs for the placeholder, which might or might not be split; // (3) one run with just a space; and // (4) one run with a symbol (i.e., a Wingdings smiley face). return new XElement(W.document, new XAttribute(XNamespace.Xmlns + "w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main"), new XElement(W.body, new XElement(W.p, new XElement(W.r, new XElement(W.rPr, new XElement(W.i)), new XElement(W.t, new XAttribute(XNamespace.Xml + "space", "preserve"), "Hello ")), runTexts.Select(rt => new XElement(W.r, new XElement(W.rPr, new XElement(W.b)), new XElement(W.t, rt))), new XElement(W.r, new XElement(W.t, new XAttribute(XNamespace.Xml + "space", "preserve"), " ")), new XElement(W.r, new XElement(W.sym, new XAttribute(W.font, "Wingdings"), new XAttribute(W._char, "F04A")))))); } private static void AssertReplacementWasSuccessful(MemoryStream stream, string replacement) { using WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, false); XElement document = wordDocument.MainDocumentPart.GetXElement(); XElement paragraph = document.Descendants(W.p).Single(); List<XElement> runs = paragraph.Elements(W.r).ToList(); // We have the expected number of runs, i.e., the lead-in, the first name, // a space character, and the symbol. Assert.Equal(4, runs.Count); // We still have the lead-in "Hello " and it is still formatted in italics. Assert.True(runs[0].Value == "Hello " && runs[0].Elements(W.rPr).Elements(W.i).Any()); // We have successfully replaced our "Firstname" placeholder and the // concrete first name is formatted in bold, exactly like the placeholder. Assert.True(runs[1].Value == replacement && runs[1].Elements(W.rPr).Elements(W.b).Any()); // We still have the space between the first name and the symbol and it // is unformatted. Assert.True(runs[2].Value == " " && !runs[2].Elements(W.rPr).Any()); // Finally, we still have our smiley face symbol run. Assert.True(IsSymbolRun(runs[3], "Wingdings", "F04A")); } private static bool IsSymbolRun(XElement run, string fontValue, string charValue) { XElement sym = run.Elements(W.sym).FirstOrDefault(); if (sym == null) return false; return (string) sym.Attribute(W.font) == fontValue && (string) sym.Attribute(W._char) == charValue; }

为什么不是内文不是解决方案？

虽然可能很想使用`InnerText`类（或`Paragraph`类的其他子类）的`OpenXmlElement`属性，但问题是您将忽略任何非文本（`w:t`）标记。例如，如果您的段落包含符号（`w:sym`元素，例如上面示例中使用的笑脸），则这些符号将丢失，因为`InnerText`属性未考虑它们。以下单元测试表明：

[Theory] [InlineData("Hello Firstname ", new[] { "Firstname" })] [InlineData("Hello Firstname ", new[] { "F", "irstname" })] [InlineData("Hello Firstname ", new[] { "F", "i", "r", "s", "t", "n", "a", "m", "e" })] public void InnerText_ParagraphWithSymbols_SymbolIgnored(string expectedInnerText, IEnumerable<string> runTexts) { // Create Word document with smiley face symbol at the end. using MemoryStream stream = CreateWordprocessingDocument(runTexts); using WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, false); Document document = wordDocument.MainDocumentPart.Document; Paragraph paragraph = document.Descendants<Paragraph>().Single(); string innerText = paragraph.InnerText; // Note that the innerText does not contain the smiley face symbol. Assert.Equal(expectedInnerText, innerText); }

请注意，在简单的用例中，您可能不需要考虑以上所有内容。但是，如果您必须处理现实生活中的文档或Microsoft Word所做的标记更改，那么您很有可能无法忽略其复杂性。等到您需要处理修订标记...

和往常一样，完整的源代码可以在我的CodeSnippets GitHub存储库中找到。查找OpenXmlRegexTests类。

Open XML-在文档模板中查找和替换多个占位符

问题描述投票：2回答：1

1个回答

简而言之，解决问题的方法是使用`OpenXmlRegex`的Open-Xml-PowerTools实用程序类，如下面进一步的单元测试所示。

以下标记很容易：

虽然通常不会找到以下标记，但它表示每个字符都有其自己的`w:r`和`w:t`元素的理论极限。

要正确执行，您必须：

好消息是您不必自己动手。埃里克·怀特`OpenXmlRegex`的Open-Xml-PowerTools实用工具类实现了上述算法（以及更多）。我已经在大型RFP和签约场景中成功使用了它，并且也对此做出了贡献。

[在本节中，我将演示如何使用Open-Xml-PowerTools将占位符文本“ Firstname”（如在问题中）替换为各种名字（在示例输出文档中使用“ Bernie”）。

让我们先看下面的示例文档，它是由稍后显示的单元测试创建的。请注意，我们已经格式化了运行和符号。就像在问题中一样，占位符“名字”被分成两个运行，即“ F”和“ irstname”。

以下是正确执行后将“ Firstname”替换为“ Bernie”的文档。请注意，格式保留，并且我们没有丢失符号。

接下来，这是一个完整的单元测试，演示了如何使用`OpenXmlRegex.Replace()`方法，并注意该示例仅显示了多个重载之一。单元测试还证明了它的工作原理：

最新问题

Open XML-在文档模板中查找和替换多个占位符

问题描述 投票：2回答：1

1个回答

简而言之，解决问题的方法是使用OpenXmlRegex的Open-Xml-PowerTools实用程序类，如下面进一步的单元测试所示。

以下标记很容易：

虽然通常不会找到以下标记，但它表示每个字符都有其自己的w:r和w:t元素的理论极限。

要正确执行，您必须：

好消息是您不必自己动手。埃里克·怀特OpenXmlRegex的Open-Xml-PowerTools实用工具类实现了上述算法（以及更多）。我已经在大型RFP和签约场景中成功使用了它，并且也对此做出了贡献。

[在本节中，我将演示如何使用Open-Xml-PowerTools将占位符文本“ Firstname”（如在问题中）替换为各种名字（在示例输出文档中使用“ Bernie”） 。

让我们先看下面的示例文档，它是由稍后显示的单元测试创​​建的。请注意，我们已经格式化了运行和符号。就像在问题中一样，占位符“名字”被分成两个运行，即“ F”和“ irstname”。

以下是正确执行后将“ Firstname”替换为“ Bernie”的文档。请注意，格式保留，并且我们没有丢失符号。

接下来，这是一个完整的单元测试，演示了如何使用OpenXmlRegex.Replace()方法，并注意该示例仅显示了多个重载之一。单元测试还证明了它的工作原理：

最新问题

问题描述投票：2回答：1

简而言之，解决问题的方法是使用`OpenXmlRegex`的Open-Xml-PowerTools实用程序类，如下面进一步的单元测试所示。

虽然通常不会找到以下标记，但它表示每个字符都有其自己的`w:r`和`w:t`元素的理论极限。

好消息是您不必自己动手。埃里克·怀特`OpenXmlRegex`的Open-Xml-PowerTools实用工具类实现了上述算法（以及更多）。我已经在大型RFP和签约场景中成功使用了它，并且也对此做出了贡献。

[在本节中，我将演示如何使用Open-Xml-PowerTools将占位符文本“ Firstname”（如在问题中）替换为各种名字（在示例输出文档中使用“ Bernie”）。

让我们先看下面的示例文档，它是由稍后显示的单元测试创建的。请注意，我们已经格式化了运行和符号。就像在问题中一样，占位符“名字”被分成两个运行，即“ F”和“ irstname”。

接下来，这是一个完整的单元测试，演示了如何使用`OpenXmlRegex.Replace()`方法，并注意该示例仅显示了多个重载之一。单元测试还证明了它的工作原理：