[当使用Office Open XML文档(例如自Office 2007发行以来由Word,Excel或PowerPoint创建的文档时,您通常会希望克隆或复制现有文档,然后对该克隆进行更改,从而创建一个新文件。
在此情况下,已经提出并回答了几个问题(有时是错误的或至少不是最佳的),表明用户确实面临问题。例如:
所以,问题是:
以下示例类显示了多种方法,可以正确地复制几乎所有文件,并在MemoryStream
或FileStream
上返回副本,然后您可以从中打开WordprocessingDocument
(Word),SpreadsheetDocument
(Excel),或PresentationDocument
(PowerPoint)并使用Open XML SDK和可选的Open-XML-PowerTools进行任何更改。
using System.IO;
namespace CodeSnippets.IO
{
/// <summary>
/// This class demonstrates multiple ways to clone files stored in the file system.
/// In all cases, the source file is stored in the file system. Where the return type
/// is a <see cref="MemoryStream"/>, the destination file will be stored only on that
/// <see cref="MemoryStream"/>. Where the return type is a <see cref="FileStream"/>,
/// the destination file will be stored in the file system and opened on that
/// <see cref="FileStream"/>.
/// </summary>
/// <remarks>
/// The contents of the <see cref="MemoryStream"/> instances returned by the sample
/// methods can be written to a file as follows:
///
/// var stream = ReadAllBytesToMemoryStream(sourcePath);
/// File.WriteAllBytes(destPath, stream.GetBuffer());
///
/// You can use <see cref="MemoryStream.GetBuffer"/> in cases where the MemoryStream
/// was created using <see cref="MemoryStream()"/> or <see cref="MemoryStream(int)"/>.
/// In other cases, you can use the <see cref="MemoryStream.ToArray"/> method, which
/// copies the internal buffer to a new byte array. Thus, GetBuffer() should be a tad
/// faster.
/// </remarks>
public static class FileCloner
{
public static MemoryStream ReadAllBytesToMemoryStream(string path)
{
byte[] buffer = File.ReadAllBytes(path);
var destStream = new MemoryStream(buffer.Length);
destStream.Write(buffer, 0, buffer.Length);
destStream.Seek(0, SeekOrigin.Begin);
return destStream;
}
public static MemoryStream CopyFileStreamToMemoryStream(string path)
{
using FileStream sourceStream = File.OpenRead(path);
var destStream = new MemoryStream((int) sourceStream.Length);
sourceStream.CopyTo(destStream);
destStream.Seek(0, SeekOrigin.Begin);
return destStream;
}
public static FileStream CopyFileStreamToFileStream(string sourcePath, string destPath)
{
using FileStream sourceStream = File.OpenRead(sourcePath);
FileStream destStream = File.Create(destPath);
sourceStream.CopyTo(destStream);
destStream.Seek(0, SeekOrigin.Begin);
return destStream;
}
public static FileStream CopyFileAndOpenFileStream(string sourcePath, string destPath)
{
File.Copy(sourcePath, destPath, true);
return new FileStream(destPath, FileMode.Open, FileAccess.ReadWrite, FileShare.None);
}
}
}
[在上述不依赖XML的开放方法之上,您还可以使用以下方法,例如,如果您已经打开了OpenXmlPackage
,例如WordprocessingDocument
,SpreadsheetDocument
或PresentationDocument
:] >
public void DoWorkCloningOpenXmlPackage() { using WordprocessingDocument sourceWordDocument = WordprocessingDocument.Open(SourcePath, false); // There are multiple overloads of the Clone() method in the Open XML SDK. // This one clones the source document to the given destination path and // opens it in read-write mode. using var wordDocument = (WordprocessingDocument) sourceWordDocument.Clone(DestPath, true); ChangeWordprocessingDocument(wordDocument); }
以上所有方法均正确克隆或复制文档。但是,最有效的是什么?
输入我们的基准测试,它使用BenchmarkDotNet
NuGet程序包:
using System; using System.Collections.Generic; using System.Diagnostics.CodeAnalysis; using System.IO; using System.Linq; using BenchmarkDotNet.Attributes; using CodeSnippets.IO; using CodeSnippets.OpenXml.Wordprocessing; using DocumentFormat.OpenXml.Packaging; using DocumentFormat.OpenXml.Wordprocessing; namespace CodeSnippets.Benchmarks.IO { public class FileClonerBenchmark { #region Setup and Helpers private const string SourcePath = "Source.docx"; private const string DestPath = "Destination.docx"; [Params(1, 10, 100, 1000)] public static int ParagraphCount; [GlobalSetup] public void GlobalSetup() { CreateTestDocument(SourcePath); CreateTestDocument(DestPath); } private static void CreateTestDocument(string path) { const string sentence = "The quick brown fox jumps over the lazy dog."; string text = string.Join(" ", Enumerable.Range(0, 22).Select(i => sentence)); IEnumerable<string> texts = Enumerable.Range(0, ParagraphCount).Select(i => text); using WordprocessingDocument unused = WordprocessingDocumentFactory.Create(path, texts); } private static void ChangeWordprocessingDocument(WordprocessingDocument wordDocument) { Body body = wordDocument.MainDocumentPart.Document.Body; Text text = body.Descendants<Text>().First(); text.Text = DateTimeOffset.UtcNow.Ticks.ToString(); } #endregion #region Benchmarks [Benchmark(Baseline = true)] public void DoWorkUsingReadAllBytesToMemoryStream() { using MemoryStream destStream = FileCloner.ReadAllBytesToMemoryStream(SourcePath); using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(destStream, true)) { ChangeWordprocessingDocument(wordDocument); } File.WriteAllBytes(DestPath, destStream.GetBuffer()); } [Benchmark] public void DoWorkUsingCopyFileStreamToMemoryStream() { using MemoryStream destStream = FileCloner.CopyFileStreamToMemoryStream(SourcePath); using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(destStream, true)) { ChangeWordprocessingDocument(wordDocument); } File.WriteAllBytes(DestPath, destStream.GetBuffer()); } [Benchmark] public void DoWorkUsingCopyFileStreamToFileStream() { using FileStream destStream = FileCloner.CopyFileStreamToFileStream(SourcePath, DestPath); using WordprocessingDocument wordDocument = WordprocessingDocument.Open(destStream, true); ChangeWordprocessingDocument(wordDocument); } [Benchmark] public void DoWorkUsingCopyFileAndOpenFileStream() { using FileStream destStream = FileCloner.CopyFileAndOpenFileStream(SourcePath, DestPath); using WordprocessingDocument wordDocument = WordprocessingDocument.Open(destStream, true); ChangeWordprocessingDocument(wordDocument); } [Benchmark] public void DoWorkCloningOpenXmlPackage() { using WordprocessingDocument sourceWordDocument = WordprocessingDocument.Open(SourcePath, false); using var wordDocument = (WordprocessingDocument) sourceWordDocument.Clone(DestPath, true); ChangeWordprocessingDocument(wordDocument); } #endregion } }
以上基准运行如下:
using BenchmarkDotNet.Running; using CodeSnippets.Benchmarks.IO; namespace CodeSnippets.Benchmarks { public static class Program { public static void Main() { BenchmarkRunner.Run<FileClonerBenchmark>(); } } }
我的机器结果如何?哪种方法最快?
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362 Intel Core i7-7500U CPU 2.70GHz (Kaby Lake), 1 CPU, 4 logical and 2 physical cores .NET Core SDK=3.0.100 [Host] : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT DefaultJob : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
| Method | ParaCount | Mean | Error | StdDev | Median | Ratio | | --------------------------------------- | --------- | --------: | --------: | --------: | --------: | ----: | | DoWorkUsingReadAllBytesToMemoryStream | 1 | 1.548 ms | 0.0298 ms | 0.0279 ms | 1.540 ms | 1.00 | | DoWorkUsingCopyFileStreamToMemoryStream | 1 | 1.561 ms | 0.0305 ms | 0.0271 ms | 1.556 ms | 1.01 | | DoWorkUsingCopyFileStreamToFileStream | 1 | 2.394 ms | 0.0601 ms | 0.1100 ms | 2.354 ms | 1.55 | | DoWorkUsingCopyFileAndOpenFileStream | 1 | 3.302 ms | 0.0657 ms | 0.0855 ms | 3.312 ms | 2.12 | | DoWorkCloningOpenXmlPackage | 1 | 4.567 ms | 0.1218 ms | 0.3591 ms | 4.557 ms | 3.13 | | | | | | | | | | DoWorkUsingReadAllBytesToMemoryStream | 10 | 1.737 ms | 0.0337 ms | 0.0361 ms | 1.742 ms | 1.00 | | DoWorkUsingCopyFileStreamToMemoryStream | 10 | 1.752 ms | 0.0347 ms | 0.0571 ms | 1.739 ms | 1.01 | | DoWorkUsingCopyFileStreamToFileStream | 10 | 2.505 ms | 0.0390 ms | 0.0326 ms | 2.500 ms | 1.44 | | DoWorkUsingCopyFileAndOpenFileStream | 10 | 3.532 ms | 0.0731 ms | 0.1860 ms | 3.455 ms | 2.05 | | DoWorkCloningOpenXmlPackage | 10 | 4.446 ms | 0.0880 ms | 0.1470 ms | 4.424 ms | 2.56 | | | | | | | | | | DoWorkUsingReadAllBytesToMemoryStream | 100 | 2.847 ms | 0.0563 ms | 0.0553 ms | 2.857 ms | 1.00 | | DoWorkUsingCopyFileStreamToMemoryStream | 100 | 2.865 ms | 0.0561 ms | 0.0786 ms | 2.868 ms | 1.02 | | DoWorkUsingCopyFileStreamToFileStream | 100 | 3.550 ms | 0.0697 ms | 0.0881 ms | 3.570 ms | 1.25 | | DoWorkUsingCopyFileAndOpenFileStream | 100 | 4.456 ms | 0.0877 ms | 0.0861 ms | 4.458 ms | 1.57 | | DoWorkCloningOpenXmlPackage | 100 | 5.958 ms | 0.1242 ms | 0.2727 ms | 5.908 ms | 2.10 | | | | | | | | | | DoWorkUsingReadAllBytesToMemoryStream | 1000 | 12.378 ms | 0.2453 ms | 0.2519 ms | 12.442 ms | 1.00 | | DoWorkUsingCopyFileStreamToMemoryStream | 1000 | 12.538 ms | 0.2070 ms | 0.1835 ms | 12.559 ms | 1.02 | | DoWorkUsingCopyFileStreamToFileStream | 1000 | 12.919 ms | 0.2457 ms | 0.2298 ms | 12.939 ms | 1.05 | | DoWorkUsingCopyFileAndOpenFileStream | 1000 | 13.728 ms | 0.2803 ms | 0.5196 ms | 13.652 ms | 1.11 | | DoWorkCloningOpenXmlPackage | 1000 | 16.868 ms | 0.2174 ms | 0.1927 ms | 16.801 ms | 1.37 |
事实证明,
DoWorkUsingReadAllBytesToMemoryStream()
始终是最快的方法。但是,DoWorkUsingCopyFileStreamToMemoryStream()
的余量很容易带有误差余量。这意味着您应尽可能在MemoryStream
上打开Open XML文档以进行处理。而且,如果不必将生成的文档存储在文件系统中,这甚至比不必要地使用FileStream
更快。涉及输出
FileStream
的地方,您会看到更大的“明显”差异(请注意,如果处理大量文档,毫秒可能会有所不同)。而且您应该注意,实际上使用File.Copy()
并不是一种很好的方法。
最后,使用OpenXmlPackage.Clone()
方法或其替代之一是最慢的方法。这是由于这样的事实,它涉及比复制字节更多的复杂逻辑。但是,如果仅获得对OpenXmlPackage
(或实际上是其子类之一)的引用,则Clone()
方法及其替代是您的最佳选择。
您可以在我的CodeSnippets GitHub存储库中找到完整的源代码。查看CodeSnippets.Benchmark项目和FileCloner类。