OpenXML 查找替换文本

问题描述 投票:0回答:3

环境

Visual Studio 2017 C#(Word .docx 文件)

问题

查找/替换仅替换“{Today}” - 它无法替换“{ConsultantName}”字段。我检查了文档并尝试使用不同的方法(请参阅注释掉的代码),但没有任何乐趣。

Word 文档只有几段文本 - 文档中没有表格或文本框。我做错了什么?

更新

当我检查 doc_text 字符串时,我可以看到“{Today}”,但“{ConsultantName}”被分成多个运行。左大括号和右大括号不与单词在一起 - 它们之间有 XML 标签:

{</w:t></w:r><w:proofErr w:type="spellStart"/><w:r w:rsidR="00544806"><w:t>ConsultantName</w:t></w:r><w:proofErr w:type="spellEnd"/><w:r w:rsidR="00544806"><w:t>}

代码

    string doc_text = string.Empty;
    List<string> s_find = new List<string>();
    List<string> s_replace = new List<string>();
    // Regex regexText = null;

    s_find.Add("{Today}");
    s_replace.Add("24 Sep 2018");
    s_find.Add("{ConsultantName}");
    s_replace.Add("John Doe");

    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(filePath, true))
    {
        // read document
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            doc_text = sr.ReadToEnd();
        }

        // find replace
        for (byte b = 0; b < s_find.Count; b++)
        {
            doc_text = new Regex(s_find[b], RegexOptions.IgnoreCase).Replace(doc_text, s_replace[b]);
            // regexText = new Regex(s_find[b]);
            // doc_text = doc_text.Replace(s_find[b], s_replace[b]);
            // doc_text = regexText.Replace(doc_text, s_replace[b]);
        }

        // update document
        using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        {
            sw.Write(doc_text);
        }
    }
c# xml openxml docx
3个回答
2
投票

注意:我想避免使用 Word Interop。我不想创建 Word 实例并使用 Word 的对象模型来执行查找/替换。

没有办法避免 Word 将文本拆分为多个运行。即使您直接在文档中键入文本、不进行任何更改也不应用格式,也会发生这种情况。

但是,我通过向文档添加自定义字段来解决该问题,如下所示:

  • 打开Word文档。转到文件->信息
  • 单击属性标题并选择高级属性
  • 选择自定义选项卡。
  • 添加您要使用的字段名称并保存。
  • 在文档中单击主菜单上的插入
  • 单击 探索快速零件 图标并选择 字段...
  • 下拉类别并选择文档信息
  • 在字段名称下:选择 DocProperty
  • 在“属性”列表中选择您的自定义字段名称,然后单击“确定”。

这会将字段插入到您的文档中,即使您应用格式设置,字段名称也将是完整的,不会被分成多个运行。

更新

为了节省用户手动向文档添加大量自定义属性的繁琐任务,我编写了一个使用 OpenXML 来执行此操作的方法。

添加以下用法:

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.CustomProperties;
using DocumentFormat.OpenXml.VariantTypes;

向文档添加自定义(文本)属性的代码:

static public bool RunWordDocumentAddProperties(string filePath, List<string> strName, List<string> strVal)
{
    bool is_ok = true;
    try
    {
        if (File.Exists(filePath) == false)
            return false;                

        using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(filePath, true))
        {
            var customProps = wordDoc.CustomFilePropertiesPart;
            if (customProps == null)
            {
                // no custom properties? Add the part, and the collection of properties
                customProps = wordDoc.AddCustomFilePropertiesPart();
                customProps.Properties = new DocumentFormat.OpenXml.CustomProperties.Properties();
            }
            for (byte b = 0; b < strName.Count; b++)
            {
                var props = customProps.Properties;                        
                if (props != null)
                {
                    var newProp = new CustomDocumentProperty();
                    newProp.VTLPWSTR = new VTLPWSTR(strVal[b].ToString());
                    newProp.FormatId = "{D5CDD505-2E9C-101B-9397-08002B2CF9AE}";
                    newProp.Name = strName[b];

                    // append the new property, and fix up all the property ID values
                    // property ID values must start at 2
                    props.AppendChild(newProp);
                    int pid = 2;
                    foreach (CustomDocumentProperty item in props)
                    {
                        item.PropertyId = pid++;
                    }
                    props.Save();
                }
            }                    
        }
    }
    catch (Exception ex)
    {
        is_ok = false;
        ProcessError(ex);
    }
    return is_ok;
}

0
投票

你只需要这样做:

*.csproj

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp3.1</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="DocumentFormat.OpenXml" Version="2.12.3" />
  </ItemGroup>

</Project>

添加这些包:

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

并将此代码放入您的系统中

using (WordprocessingDocument wordprocessingDocument =
            WordprocessingDocument.Open(filepath, true))
        {
            var body = wordprocessingDocument.MainDocumentPart.Document.Body;

            var paras = body.Elements<Paragraph>();

            foreach (var para in paras)
            {
                foreach (var run in para.Elements<Run>())
                {
                    foreach (var text in run.Elements<Text>())
                    {
                        if (text.Text.Contains("#_KEY_1_#"))
                        {
                            text.Text = text.Text.Replace("#_KEY_1_#", "replaced-text");
                        }
                    }
                }
            }
        }

完成


0
投票

我想分享一个替换Word、Excel和PowerPoint文档中文本的解决方案。它采用简单的基于字典的方法,提供了一种简单的方法来定义您的替代品。

用途:

Dictionary<string, string> replacements = new()
{
    { "##keyword##", sampleText },
};

IFileHandler fileHandler = FileHandlerFactory.Create(yourFileExtension); //docx, xlsx, pptx

byte[] updatedFile = fileHandler.UpdateFile(originalFile, replacements); //byte[]

最佳实践:我建议使用 ## 作为关键字的前缀和后缀,并以小写形式插入。这确保了可靠的关键字识别和替换。

代码:

    public interface IFileHandler
{
    byte[] UpdateFile(byte[] file, Dictionary<string, string> replacements);
}

public static class FileHandlerFactory
{
    public static IFileHandler Create(string fileExtension)
    {
        return fileExtension switch
        {
            "docx" => new WordDocumentHandler(),
            "xlsx" => new ExcelHandler(),
            "pptx" => new PowerPointHandler(),
            _ => throw new NotSupportedException("File type not supported"),
        };
    }
}

public class WordDocumentHandler : IFileHandler
{
    public byte[] UpdateFile(byte[] file, Dictionary<string, string> replacements)
    {
        string temporaryFilePath = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString() + ".docx");
        File.WriteAllBytes(temporaryFilePath, file);

        using (WordprocessingDocument document = WordprocessingDocument.Open(temporaryFilePath, isEditable: true))
        {
            MainDocumentPart? documentPart = document.MainDocumentPart;

            if (documentPart != null)
            {
                var header = documentPart.HeaderParts.SelectMany(header => header.RootElement!.Descendants<DocumentFormat.OpenXml.Wordprocessing.Text>());
                var body = documentPart.Document.Descendants<DocumentFormat.OpenXml.Wordprocessing.Text>();
                var footer = documentPart.FooterParts.SelectMany(header => header.RootElement!.Descendants<DocumentFormat.OpenXml.Wordprocessing.Text>());

                var allText = header.Concat(body).Concat(footer);

                foreach (var textElement in allText)
                {
                    string textContent = textElement.Text;
                    foreach (var replacement in replacements.Where(replacement => textContent.Contains(replacement.Key)))
                    {
                        textElement.Text = textElement.Text.Replace(textContent, replacement.Value);
                    }
                }

                document.Save();
            }
        }

        return File.ReadAllBytes(temporaryFilePath);
    }
}

public class ExcelHandler : IFileHandler
{
    public byte[] UpdateFile(byte[] file, Dictionary<string, string> replacements)
    {
        string temporaryFilePath = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString() + ".xlsx");
        File.WriteAllBytes(temporaryFilePath, file);

        using (SpreadsheetDocument document = SpreadsheetDocument.Open(temporaryFilePath, isEditable: true))
        {
            WorkbookPart? workbookPart = document.WorkbookPart;

            if (workbookPart != null)
            {
                workbookPart.Workbook.CalculationProperties.ForceFullCalculation = true;
                workbookPart.Workbook.CalculationProperties.FullCalculationOnLoad = true;

                SharedStringTablePart? sharedStringTablePart = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();

                if (sharedStringTablePart != null)
                {
                    foreach (WorksheetPart worksheetPart in workbookPart.WorksheetParts)
                    {
                        GetCells(worksheetPart).ForEach(cell => ProcessCell(replacements, cell, sharedStringTablePart));
                    }
                }

                document.Save();
            }
        }

        return File.ReadAllBytes(temporaryFilePath);
    }

    private static List<Cell> GetCells(WorksheetPart worksheetPart)
    {
        return worksheetPart.Worksheet.Elements<SheetData>().SelectMany(i => i.Elements<Row>())
            .SelectMany(i => i.Elements<Cell>()).ToList();
    }

    private static void ProcessCell(Dictionary<string, string> replacements, Cell cell, SharedStringTablePart sharedStringTablePart)
    {
        bool isValidCell = cell.DataType != null && cell.DataType.Value == CellValues.SharedString && cell.CellValue != null;

        if (isValidCell)
        {
            int sharedStringIndex = int.Parse(cell.CellValue.InnerText);
            SharedStringItem sharedStringItem = sharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(sharedStringIndex);

            string? text = sharedStringItem.Text?.Text;
            foreach (var replacement in replacements.Where(replacement => !string.IsNullOrEmpty(text) && text.Contains(replacement.Key)))
            {
                cell.CellValue = new CellValue(replacement.Value);
                cell.DataType = new EnumValue<CellValues>(CellValues.String);
            }
        }
    }
}

public class PowerPointHandler : IFileHandler
{
    public byte[] UpdateFile(byte[] file, Dictionary<string, string> replacements)
    {
        string temporaryFilePath = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString() + ".pptx");
        File.WriteAllBytes(temporaryFilePath, file);

        using (PresentationDocument document = PresentationDocument.Open(temporaryFilePath, isEditable: true))
        {
            PresentationPart? presentationPart = document.PresentationPart;

            if (presentationPart != null)
            {
                foreach (SlideMasterPart slideMasterPart in presentationPart.SlideMasterParts)
                {
                    ReplaceText(slideMasterPart.SlideMaster.Descendants<DocumentFormat.OpenXml.Drawing.Text>(), replacements);
                }

                foreach (SlidePart slidePart in presentationPart.SlideParts)
                {
                    ReplaceText(slidePart.Slide.Descendants<DocumentFormat.OpenXml.Drawing.Text>(), replacements);
                }
            }

            document.Save();
        }

        return File.ReadAllBytes(temporaryFilePath);
    }

    private static void ReplaceText(IEnumerable<DocumentFormat.OpenXml.Drawing.Text> texts, Dictionary<string, string> replacements)
    {
        foreach (var text in texts)
        {
            foreach (var replacement in replacements.Where(replacement => text.Text.Contains(replacement.Key)))
            {
                text.Text = text.Text.Replace(replacement.Key, replacement.Value);
            }
        }
    }
}

要点:

  • 处理页眉/页脚(Word):确保整个文档中的文本替换。
  • 计算单元格 (Excel):准确更新计算单元格内的文本。
  • 主幻灯片 (PowerPoint):替换主幻灯片中的文本以保持一致性。
© www.soinside.com 2019 - 2024. All rights reserved.