如何使用OpenXML处理smartTag节点

问题描述 投票:0回答:1

我有一个C#应用程序,可以使用OpenXML从word(.docx)文件中读取文本。

通常,有一组段落(p)包含运行元素(r)。我可以使用

遍历“运行”节点
foreach ( var run in para.Descendants<Run>() )
{
  ...
}

在一个特定的文档中有一个文本“ START”,它分为三部分,“ ST”,“ AR”和“ T”。它们中的每一个都由“运行”节点定义,但是在两种情况下,“运行”节点包含在“ smartTag”节点中。

<w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="PersonName">
    <w:r w:rsidRPr="00BF444F">
        <w:rPr>
            <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
            <w:b/>
            <w:bCs/>
            <w:sz w:val="40"/>
            <w:szCs w:val="40"/>
        </w:rPr>
        <w:t>ST</w:t>
    </w:r>
</w:smartTag>
<w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="PersonName">
    <w:r w:rsidRPr="00BF444F">
        <w:rPr>
            <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
            <w:b/>
            <w:bCs/>
            <w:sz w:val="40"/>
            <w:szCs w:val="40"/>
        </w:rPr>
        <w:t>AR</w:t>
    </w:r>
</w:smartTag>
<w:r w:rsidRPr="00BF444F">
    <w:rPr>
        <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
        <w:b/>
        <w:bCs/>
        <w:sz w:val="40"/>
        <w:szCs w:val="40"/>
    </w:rPr>
    <w:t xml:space="preserve">T</w:t>
</w:r>

据我所知,OpenXML不支持smartTag节点。结果,它仅生成OpenXmlUnknownElement节点。

造成这一困难的原因是,它会为smartTag的所有后代节点生成OpenXmlUnknownElement节点。这意味着我不能简单地获取第一个子节点并将其强制转换为Run。

通过InnerText属性获取文本很容易,但是我还需要获取格式信息。

是否有任何合理简便的方法来处理此问题?

目前,我最好的想法是编写一个预处理器,以删除智能标记节点。


编辑

关注辛迪·梅斯特的评论。

我正在使用OpenXml版本2.7.2。正如Cindy所指出的那样,OpenXML 2.0中有一个SmartTagRun类。我不知道那堂课。

我在What's new in the Open XML SDK 2.5 for Office页上找到了以下信息

智能标签

由于智能标记在Office 2010中已弃用,因此Open XML SDK2.5不支持与智能标记相关​​的Open XML元素。 Open XML SDK 2.5仍然可以将智能标记元素作为未知元素处理,但是,用于Office的Open XML SDK 2.5生产率工具可以验证Office文档文件中的那些元素(请参阅以下列表)为无效的标签。

因此,听起来可能的解决方案是使用OpenXML 2.0。

c# openxml openxml-sdk
1个回答
0
投票

解决方案是使用Linq to XML(如果喜欢,可以使用System.Xml类),如以下代码所示,删除w:smartTag元素:

public class SmartTagTests
{
    private const string Xml =
        @"<w:document xmlns:w=""http://schemas.openxmlformats.org/wordprocessingml/2006/main"">
<w:body>
    <w:p>
        <w:smartTag w:uri=""urn:schemas-microsoft-com:office:smarttags"" w:element=""PersonName"">
            <w:r w:rsidRPr=""00BF444F"">
                <w:rPr>
                    <w:rFonts w:ascii=""Arial"" w:hAnsi=""Arial"" w:cs=""Arial""/>
                    <w:b/>
                    <w:bCs/>
                    <w:sz w:val=""40""/>
                    <w:szCs w:val=""40""/>
                </w:rPr>
                <w:t>ST</w:t>
            </w:r>
        </w:smartTag>
        <w:smartTag w:uri=""urn:schemas-microsoft-com:office:smarttags"" w:element=""PersonName"">
            <w:r w:rsidRPr=""00BF444F"">
                <w:rPr>
                    <w:rFonts w:ascii=""Arial"" w:hAnsi=""Arial"" w:cs=""Arial""/>
                    <w:b/>
                    <w:bCs/>
                    <w:sz w:val=""40""/>
                    <w:szCs w:val=""40""/>
                </w:rPr>
                <w:t>AR</w:t>
            </w:r>
        </w:smartTag>
        <w:r w:rsidRPr=""00BF444F"">
            <w:rPr>
                <w:rFonts w:ascii=""Arial"" w:hAnsi=""Arial"" w:cs=""Arial""/>
                <w:b/>
                <w:bCs/>
                <w:sz w:val=""40""/>
                <w:szCs w:val=""40""/>
            </w:rPr>
            <w:t xml:space=""preserve"">T</w:t>
        </w:r>
    </w:p>
</w:body>
</w:document>";

    [Fact]
    public void CanStripSmartTags()
    {
        // Say you have a WordprocessingDocument stored on a stream (e.g., read from a file).
        using Stream stream = CreateTestWordprocessingDocument();

        // Open the WordprocessingDocument and inspect it using the strongly typed classes.
        // This shows that we find OpenXmlUnknownElement instances are found and only a
        // single Run instance is recognized.
        using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, false))
        {
            // Now, get the w:document as a strongly typed Document instance and demonstrate
            // that the document contains three Run instances.
            MainDocumentPart part = wordDocument.MainDocumentPart;
            Document document = part.Document;

            Assert.Single(document.Descendants<Run>());
            Assert.NotEmpty(document.Descendants<OpenXmlUnknownElement>());
        }

        // Now, open that WordprocessingDocument to make edits, using Linq to XML.
        // Do NOT use the strongly typed classes in this context.
        using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, true))
        {
            // Get the w:document as an XElement and demonstrate that this w:document contains
            // w:smartTag elements.
            MainDocumentPart part = wordDocument.MainDocumentPart;
            string xml = ReadString(part);
            XElement document = XElement.Parse(xml);

            Assert.NotEmpty(document.Descendants().Where(d => d.Name.LocalName == "smartTag"));

            // Transform the w:document, stripping all w:smartTag elements and demonstrate
            // that the transformed w:document no longer contains w:smartTag elements.
            var transformedDocument = (XElement) StripSmartTags(document);

            Assert.Empty(transformedDocument.Descendants().Where(d => d.Name.LocalName == "smartTag"));

            // Write the transformed document back to the part.
            WriteString(part, transformedDocument.ToString(SaveOptions.DisableFormatting));
        }

        // Open the WordprocessingDocument again and inspect it using the strongly typed classes.
        // This demonstrates that all Run instances are now recognized.
        using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, false))
        {
            // Now, get the w:document as a strongly typed Document instance and demonstrate
            // that the document contains three Run instances.
            MainDocumentPart part = wordDocument.MainDocumentPart;
            Document document = part.Document;

            Assert.Equal(3, document.Descendants<Run>().Count());
            Assert.Empty(document.Descendants<OpenXmlUnknownElement>());
        }
    }

    /// <summary>
    /// Recursive, pure functional transform that removes all w:smartTag elements.
    /// </summary>
    /// <param name="node">The <see cref="XNode" /> to be transformed.</param>
    /// <returns>The transformed <see cref="XNode" />.</returns>
    private static object StripSmartTags(XNode node)
    {
        if (!(node is XElement element))
        {
            return node;
        }

        if (element.Name.LocalName == "smartTag")
        {
            return element.Elements();
        }

        return new XElement(element.Name, element.Attributes(),
            element.Nodes().Select(StripSmartTags));
    }

    private static Stream CreateTestWordprocessingDocument()
    {
        var stream = new MemoryStream();

        using var wordDocument = WordprocessingDocument.Create(stream, WordprocessingDocumentType.Document);
        MainDocumentPart part = wordDocument.AddMainDocumentPart();
        WriteString(part, Xml);

        return stream;
    }

    #region Generic Open XML Utilities

    private static string ReadString(OpenXmlPart part)
    {
        using Stream stream = part.GetStream(FileMode.Open, FileAccess.Read);
        using var streamReader = new StreamReader(stream);
        return streamReader.ReadToEnd();
    }

    private static void WriteString(OpenXmlPart part, string text)
    {
        using Stream stream = part.GetStream(FileMode.Create, FileAccess.Write);
        using var streamWriter = new StreamWriter(stream);
        streamWriter.Write(text);
    }

    #endregion
}

您还可以使用PowerTools for Open XML,它提供了直接支持删除w:smartTag元素的标记简化器。

© www.soinside.com 2019 - 2024. All rights reserved.