我知道有很多关于此主题的帖子,但是似乎没有一个帖子可以处理这个特定问题。我正在尝试制作一个小型的通用文档生成器POC。我正在使用Open XML。
代码如下:
private static void ReplacePlaceholders<T>(string templateDocumentPath, T templateObject)
where T : class
{
using (var templateDocument = WordprocessingDocument.Open(templateDocumentPath, true))
{
string templateDocumentText = null;
using (var streamReader = new StreamReader(templateDocument.MainDocumentPart.GetStream()))
{
templateDocumentText = streamReader.ReadToEnd();
}
var props = templateObject.GetType().GetProperties();
foreach (var prop in props)
{
var regexText = new Regex($"{prop.Name}");
templateDocumentText =
regexText.Replace(templateDocumentText, prop.GetValue(templateObject).ToString());
}
using var streamWriter = new StreamWriter(templateDocument.MainDocumentPart.GetStream(FileMode.Create));
streamWriter.Write(templateDocumentText);
}
}
该代码按预期工作。问题如下:
StreamReader.ReadToEnd()在标签之间分割我的[[placeholders,所以我的Replace方法替换了only不会被分割的单词。
在这种情况下,我的代码将搜索单词“ Firstname”,但将查找“ irstname”,因此它不会替代它。有没有办法逐字扫描整个.docx并替换它们?
部分解决方案/解决方法
我发现:-我注意到您必须立即在.docx中写入占位符(无需重新编辑)。例如,如果我写“ firstname”,然后再将其修改为“ Firstname”,它将把单词拆分为“ F”“ irstname”。如果不进行编辑,它将被拆分。OpenXmlRegex
的Open-Xml-PowerTools实用程序类,如下面进一步的单元测试所示。为什么?
w:p
(Paragraph
)元素表示的文本完全相同。这两个示例之间的任何事情都是可能的,因此任何真正的解决方案都必须能够解决这一问题。极端方案1:单个w:r
和w:t
元素
<w:p>
<w:r>
<w:t>Firstname</w:t>
</w:r>
</w:p>
极端方案2:单字符w:r
和w:t
元素虽然通常不会找到以下标记,但它表示每个字符都有其自己的
w:r
和w:t
元素的理论极限。
<w:p> <w:r> <w:t>F</w:t> <w:t>i</w:t> <w:t>r</w:t> <w:t>s</w:t> <w:t>t</w:t> <w:t>n</w:t> <w:t>a</w:t> <w:t>m</w:t> <w:t>e</w:t> </w:r> </w:p>
您可能会问,为什么在实践中没有出现这个极端示例?答案是,如果您想自己动手,它在解决方案中起着至关重要的作用。如何滚动您自己的?
要正确执行,您必须:
将您的段落(
w:r
)的运行(w:p
)转换为单字符运行(即w:r
元素具有一个单字符w:t
或每个w:sym
),并保留运行属性(w:rPr
);- 对那些单字符运行执行搜索和替换操作(使用其他技巧);和
- 考虑到由搜索和替换操作导致的运行的潜在不同运行属性(
w:rPr
,将这样得到的运行转换回代表文本及其格式所需的最少的“合并”运行)。 >- 替换文本时,您不应丢失或更改不受替换影响的文本格式。您也不应删除不受影响的字段或内容控件(
w:sdt
)。嗯,顺便说一句,不要忘记修订标记,例如w:ins
和w:del
...为什么不滚动您自己的?
好消息是您不必自己动手。埃里克·怀特
OpenXmlRegex
的Open-Xml-PowerTools实用工具类实现了上述算法(以及更多)。我已经在大型RFP和签约场景中成功使用了它,并且也对此做出了贡献。如何使用OPEN-XML-POWERTOOLS?
[在本节中,我将演示如何使用Open-Xml-PowerTools将占位符文本“ Firstname”(如在问题中)替换为各种名字(在示例输出文档中使用“ Bernie”) 。
样本输入文件
让我们先看下面的示例文档,它是由稍后显示的单元测试创建的。请注意,我们已经格式化了运行和符号。就像在问题中一样,占位符“名字”被分成两个运行,即“ F”和“ irstname”。
<?xml version="1.0" encoding="utf-8"?> <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"> <w:body> <w:p> <w:r> <w:rPr> <w:i /> </w:rPr> <w:t xml:space="preserve">Hello </w:t> </w:r> <w:r> <w:rPr> <w:b /> </w:rPr> <w:t>F</w:t> </w:r> <w:r> <w:rPr> <w:b /> </w:rPr> <w:t>irstname</w:t> </w:r> <w:r> <w:t xml:space="preserve"> </w:t> </w:r> <w:r> <w:sym w:font="Wingdings" w:char="F04A" /> </w:r> </w:p> </w:body> </w:document>
所需的输出文件以下是正确执行后将“ Firstname”替换为“ Bernie”的文档。请注意,格式保留,并且我们没有丢失符号。
<?xml version="1.0" encoding="utf-8"?> <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"> <w:body> <w:p> <w:r> <w:rPr> <w:i /> </w:rPr> <w:t xml:space="preserve">Hello </w:t> </w:r> <w:r> <w:rPr> <w:b /> </w:rPr> <w:t>Bernie</w:t> </w:r> <w:r> <w:t xml:space="preserve"> </w:t> </w:r> <w:r> <w:sym w:font="Wingdings" w:char="F04A" /> </w:r> </w:p> </w:body> </w:document>
样本用法接下来,这是一个完整的单元测试,演示了如何使用
OpenXmlRegex.Replace()
方法,并注意该示例仅显示了多个重载之一。单元测试还证明了它的工作原理:
[Theory]
[InlineData("1 Run", "Firstname", new[] { "Firstname" }, "Albert")]
[InlineData("2 Runs", "Firstname", new[] { "F", "irstname" }, "Bernie")]
[InlineData("9 Runs", "Firstname", new[] { "F", "i", "r", "s", "t", "n", "a", "m", "e" }, "Charly")]
public void Replace_PlaceholderInOneOrMoreRuns_SuccessfullyReplaced(
string example,
string propName,
IEnumerable<string> runTexts,
string replacement)
{
// Create a test WordprocessingDocument on a MemoryStream.
using MemoryStream stream = CreateWordprocessingDocument(runTexts);
// Save the Word document before replacing the placeholder.
// You can use this to inspect the input Word document.
File.WriteAllBytes($"{example} before Replacing.docx", stream.ToArray());
// Replace the placeholder identified by propName with the replacement text.
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, true))
{
// Read the root element, a w:document in this case.
// Note that GetXElement() is a shortcut for GetXDocument().Root.
// This caches the root element and we can later write it back
// to the main document part, using the PutXDocument() method.
XElement document = wordDocument.MainDocumentPart.GetXElement();
// Specify the parameters of the OpenXmlRegex.Replace() method,
// noting that the replacement is given as a parameter.
IEnumerable<XElement> content = document.Descendants(W.p);
var regex = new Regex(propName);
// Perform the replacement, thereby modifying the root element.
OpenXmlRegex.Replace(content, regex, replacement, null);
// Write the changed root element back to the main document part.
wordDocument.MainDocumentPart.PutXDocument();
}
// Assert that we have done it right.
AssertReplacementWasSuccessful(stream, replacement);
// Save the Word document after having replaced the placeholder.
// You can use this to inspect the output Word document.
File.WriteAllBytes($"{example} after Replacing.docx", stream.ToArray());
}
private static MemoryStream CreateWordprocessingDocument(IEnumerable<string> runTexts)
{
var stream = new MemoryStream();
const WordprocessingDocumentType type = WordprocessingDocumentType.Document;
using (WordprocessingDocument wordDocument = WordprocessingDocument.Create(stream, type))
{
MainDocumentPart mainDocumentPart = wordDocument.AddMainDocumentPart();
mainDocumentPart.PutXDocument(new XDocument(CreateDocument(runTexts)));
}
return stream;
}
private static XElement CreateDocument(IEnumerable<string> runTexts)
{
// Produce a w:document with a single w:p that contains:
// (1) one italic run with some lead-in, i.e., "Hello " in this example;
// (2) one or more bold runs for the placeholder, which might or might not be split;
// (3) one run with just a space; and
// (4) one run with a symbol (i.e., a Wingdings smiley face).
return new XElement(W.document,
new XAttribute(XNamespace.Xmlns + "w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main"),
new XElement(W.body,
new XElement(W.p,
new XElement(W.r,
new XElement(W.rPr,
new XElement(W.i)),
new XElement(W.t,
new XAttribute(XNamespace.Xml + "space", "preserve"),
"Hello ")),
runTexts.Select(rt =>
new XElement(W.r,
new XElement(W.rPr,
new XElement(W.b)),
new XElement(W.t, rt))),
new XElement(W.r,
new XElement(W.t,
new XAttribute(XNamespace.Xml + "space", "preserve"),
" ")),
new XElement(W.r,
new XElement(W.sym,
new XAttribute(W.font, "Wingdings"),
new XAttribute(W._char, "F04A"))))));
}
private static void AssertReplacementWasSuccessful(MemoryStream stream, string replacement)
{
using WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, false);
XElement document = wordDocument.MainDocumentPart.GetXElement();
XElement paragraph = document.Descendants(W.p).Single();
List<XElement> runs = paragraph.Elements(W.r).ToList();
// We have the expected number of runs, i.e., the lead-in, the first name,
// a space character, and the symbol.
Assert.Equal(4, runs.Count);
// We still have the lead-in "Hello " and it is still formatted in italics.
Assert.True(runs[0].Value == "Hello " && runs[0].Elements(W.rPr).Elements(W.i).Any());
// We have successfully replaced our "Firstname" placeholder and the
// concrete first name is formatted in bold, exactly like the placeholder.
Assert.True(runs[1].Value == replacement && runs[1].Elements(W.rPr).Elements(W.b).Any());
// We still have the space between the first name and the symbol and it
// is unformatted.
Assert.True(runs[2].Value == " " && !runs[2].Elements(W.rPr).Any());
// Finally, we still have our smiley face symbol run.
Assert.True(IsSymbolRun(runs[3], "Wingdings", "F04A"));
}
private static bool IsSymbolRun(XElement run, string fontValue, string charValue)
{
XElement sym = run.Elements(W.sym).FirstOrDefault();
if (sym == null) return false;
return (string) sym.Attribute(W.font) == fontValue &&
(string) sym.Attribute(W._char) == charValue;
}
为什么不是内文不是解决方案?
InnerText
类(或Paragraph
类的其他子类)的OpenXmlElement
属性,但问题是您将忽略任何非文本(w:t
)标记。例如,如果您的段落包含符号(w:sym
元素,例如上面示例中使用的笑脸),则这些符号将丢失,因为InnerText
属性未考虑它们。以下单元测试表明:[Theory]
[InlineData("Hello Firstname ", new[] { "Firstname" })]
[InlineData("Hello Firstname ", new[] { "F", "irstname" })]
[InlineData("Hello Firstname ", new[] { "F", "i", "r", "s", "t", "n", "a", "m", "e" })]
public void InnerText_ParagraphWithSymbols_SymbolIgnored(string expectedInnerText, IEnumerable<string> runTexts)
{
// Create Word document with smiley face symbol at the end.
using MemoryStream stream = CreateWordprocessingDocument(runTexts);
using WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, false);
Document document = wordDocument.MainDocumentPart.Document;
Paragraph paragraph = document.Descendants<Paragraph>().Single();
string innerText = paragraph.InnerText;
// Note that the innerText does not contain the smiley face symbol.
Assert.Equal(expectedInnerText, innerText);
}
请注意,在简单的用例中,您可能不需要考虑以上所有内容。但是,如果您必须处理现实生活中的文档或Microsoft Word所做的标记更改,那么您很有可能无法忽略其复杂性。等到您需要处理修订标记...
和往常一样,完整的源代码可以在我的CodeSnippets GitHub存储库中找到。查找OpenXmlRegexTests类。