我通过 JSON 接收包含任意数量 PDF 文件的文件。我必须拆分文件并删除反斜杠字符才能将它们转换为 PDF 文件。这适用于通常为数万兆字节的捆绑文件大小。最近,我开始获取的文件大小是该大小的 10 倍,并且程序因内存不足而崩溃。当时内存使用量还不到3GB,我电脑里有32GB。
temp = buffer.IndexOf("FileData");
while (temp > 0)
{
docBuffer = buffer;
int gen = 1;
string curFile = wdirl + outFileZ + "AAA" + gen.ToString("D3") + ".pdf";
FileStream strmFileA = File.Create(curFile);
docBuffer = docBuffer.Substring(temp + 11);
temp = docBuffer.IndexOf('"');
buffer = docBuffer.Substring(temp);
docBuffer = docBuffer.Substring(0, temp);
// string docBufferA = docBuffer.Replace("\\", string.Empty);
StringBuilder docBufferA = new StringBuilder(docBuffer);
docBufferA.Replace("\\", "");
docBuffer = docBufferA.ToString();
bytes = Convert.FromBase64String(docBuffer);
writer = new BinaryWriter(strmFileA);
writer.Write(bytes, 0, bytes.Length);
writer.Close();
temp = buffer.IndexOf("FileData";
}
我尝试在删除反斜杠时使用
StringBuilder
,这将问题推迟了一段时间。
C# 中的字符串是不可变的。这意味着每次调用 Substring 等方法时,都会在内存中创建一个新字符串。当处理大字符串时,这种行为会很快消耗大量内存。
要解决此问题,请尝试以下方法:
Stream the JSON
:而不是将整个 JSON 文件加载到内存中
立即使用流解析器,例如来自
Newtonsoft.Json 库。这样就可以处理JSON内容了
一块一块地,而不将整个东西加载到内存中。Directly Write to File Stream
:而不是构造解码的
内存中的字节数组,可以直接将解码后的字节写入
文件流。using Newtonsoft.Json;
using System.IO;
// Open the JSON file for reading.
using (StreamReader file = File.OpenText(jsonFilePath))
using (JsonTextReader reader = new JsonTextReader(file))
{
while (reader.Read())
{
if (reader.Value != null && reader.TokenType == JsonToken.PropertyName && (string)reader.Value == "FileData")
{
// Move to the next token, which should be the file data.
reader.Read();
string fileData = (string)reader.Value;
// Open a FileStream for writing the decoded bytes.
using (FileStream fs = new FileStream(outputFilePath, FileMode.Create))
{
// Create a buffer to hold chunks of the Base64 string.
int bufferSize = 4 * 1024; // 4KB buffer. Adjust this value based on your needs.
int position = 0;
while (position < fileData.Length)
{
int length = Math.Min(bufferSize, fileData.Length - position);
string chunk = fileData.Substring(position, length);
// Remove backslashes from the current chunk.
chunk = chunk.Replace("\\", "");
// Decode the chunk and write to the FileStream.
byte[] bytes = Convert.FromBase64String(chunk);
fs.Write(bytes, 0, bytes.Length);
position += length;
}
}
}
}
}