以下代码执行:
byte[] outbytes = File.ReadAllBytes(sourcefile).Skip(offset).Take(size).ToArray();
File.WriteAllBytes(outfile, outbytes);
但是每个步骤都有大约2GB的数据限制。
编辑:提取的byte
的大小也可以大于 2GB
。
我该如何处理大文件?无论大小如何,保持最佳性能的最佳方法是什么?
Thx!
最好将数据从一个文件流传输到另一个文件,只将其中的一小部分加载到内存中:
public static void CopyFileSection(string inFile, string outFile, long startPosition, long size)
{
// Open the files as streams
using (var inStream = File.OpenRead(inFile))
using (var outStream = File.OpenWrite(outFile))
{
// seek to the start position
inStream.Seek(startPosition, SeekOrigin.Begin);
// Create a variable to track how much more to copy
// and a buffer to temporarily store a section of the file
long remaining = size;
byte[] buffer = new byte[81920];
do
{
// Read the smaller of 81920 or remaining and break out of the loop if we've already reached the end of the file
int bytesRead = inStream.Read(buffer, 0, (int)Math.Min(buffer.Length, remaining));
if (bytesRead == 0) { break; }
// Write the buffered bytes to the output file
outStream.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
while (remaining > 0);
}
}
用法:
CopyFileSection(sourcefile, outfile, offset, size);
这应该具有与当前方法等效的功能,而无须将整个文件(无论其大小)读入内存的开销。
注意:如果您在使用异步/等待的代码中执行此操作,则应将CopyFileSection
更改为public static async Task CopyFileSection
,并将inStream.Read
和outStream.Write
分别更改为await inStream.ReadAsync
和await outStream.WriteAsync
。
FileStream从5 Gb文件中取出中间3 Gb的示例:
byte[] buffer = new byte{1024*1024];
using(var readFS = File.Open(pathToBigFile))
using(var writeFS = File.OpenWrite(pathToNewFile))
{
readFS.Seek(1024*1024*1024); //seek to 1gb in
for(int i=0; i < 3000; i++){ //3000 times of one megabyte = 3gb
int bytesRead = readFS.Read(buffer, 0, buffer.Length);
writeFS.Write(buffer, 0, bytesRead);
}
}
这不是生产级代码;读取可能无法读取完整的兆字节,因此您最终将获得不到3Gb的容量-更多地展示了使用两个文件流并从一个重复读取并向另一个重复写入的概念。我确信您可以对其进行修改,以便通过跟踪循环中所有字节的总数来读取确切的字节数,并在读取足够的字节后停止读取。