我编写了一个 .NET 应用程序,该应用程序旨在从一个目录中获取大量(>10,000)非常小的文件,并将它们组织到 root/yyyy/MM/dd/file.name 的目录树中。该应用程序可以运行,并且对于少量数据来说速度很快(<1,000) of files, but the more files I have to move, the longer it takes. I'm a newbie at .NET and C#, but could something like running the move in parallel make it go faster? Or compressing the files in batches before the move? Ultimately, I'm trying to avoid problems with the program stalling or failing when the number of files to move gets too large.
这是我正在使用的代码:
using System;
using System.IO;
using System.Configuration;
namespace consolefilemover
{
internal class Program
{
static void Main(string[] args)
{
string rootDir = ConfigurationManager.AppSettings["rootDir"];
string[] files = Directory.GetFiles(rootDir);
string log = "auditlog.txt";
foreach (string filePath in files)
{
FileInfo fileInfo = new FileInfo(filePath);
DateTime lastModifiedDate = fileInfo.LastWriteTime;
// Create the destination directory path based on the last modified date
string destinationDir = Path.Combine(rootDir, lastModifiedDate.ToString("yyyy"), lastModifiedDate.ToString("MM"), lastModifiedDate.ToString("dd"));
// Create the destination directory if it doesn't exist
Directory.CreateDirectory(destinationDir);
// Move the file to the destination directory
string destinationFilePath = Path.Combine(destinationDir, Path.GetFileName(filePath));
if (!File.Exists(destinationFilePath))
{
File.Move(filePath, destinationFilePath);
}
else
{
destinationFilePath = Path.Combine(destinationDir, Path.GetFileNameWithoutExtension(filePath) + DateTime.Now.ToString("yyyyMMddHHmmss") + Path.GetExtension(filePath));
File.Move(filePath, destinationFilePath);
}
//Making a location for monthly audit logs
string logfile = Path.Combine(rootDir, lastModifiedDate.ToString("yyyy"), lastModifiedDate.ToString("MM"), log);
//Define the data for the log file
string logInfo = Environment.NewLine + lastModifiedDate.ToString("yyyyMMdd") + " | Source: " + filePath + " | dest: " + destinationFilePath;
//Creates Log File
File.AppendAllText(logfile, logInfo);
}
}
}
}
首先将
Directory.GetFiles
替换为 Directory.EnumerateFiles
。这将使您避免大型数组分配。
然后,创建一个已创建目录的
HashSet
,并在调用Directory.CreateDirectory
之前先检查它。如果它们的值经常重复,这将为您节省大量 I/O。
还替换字符串连接
Path.GetFileNameWithoutExtension(filePath) + DateTime.Now.ToString("yyyyMMddHHmmss") + Path.GetExtension(filePath)
带插值
$"{Path.GetFileNameWithoutExtension(filePath)}{DateTime.Now.ToString("yyyyMMddHHmmss")}{Path.GetExtension(filePath)}"
减少中间字符串分配的数量。