我们从远程服务器以
IAsyncEnumerable
的形式收到大量记录,经过简单的转换和 JSON 序列化后,我们尝试将结果(JSON 文件)上传到 Azure Blob 存储。
有什么方法可以改进这段代码以减少最大内存使用量。我们从远程服务器收到的记录数量很多。
public async Task ExecuteAsync()
{
var items = GetItems();
using var memoryStream = new MemoryStream();
await JsonSerializer.SerializeAsync(memoryStream, items);
var blobContainerClient = new Azure.Storage.Blobs.BlobContainerClient("connectionString", "container");
var blobClient = blobContainerClient.GetBlobClient("blobName");
memoryStream.Position = 0;
await blobClient.UploadAsync(memoryStream, overwrite: true);
}
private static async IAsyncEnumerable<Item> GetItems()
{
// This is calling a remote server that returns IAsyncEnumerable<Row>
await foreach (var row in GetIAsyncEnumerable())
{
yield return new Item(row);
}
}
直到 JSON 序列化一切看起来都很好,我们一一接收记录,并一一转换对象类型和序列化。
但是在
await JsonSerializer.SerializeAsync(memoryStream, items)
之后,我们等待所有项目完成并在内存中实现,然后开始上传。
有什么方法可以让我们继续这个
IAsyncEnumerable
链并在收到第一个项目后立即开始上传并继续上传每个新对象?
您可以直接打开 blob 的流并将项目序列化到其中。
public async Task ExecuteAsync()
{
var blobContainerClient = new BlobContainerClient("connectionString", "container");
var blobClient = blobContainerClient.GetBlobClient("blobName");
var items = GetItems();
await using var blobStream = await blobClient.OpenWriteAsync(overwrite: true);
await JsonSerializer.SerializeAsync(blobStream, items);
}
为了能够在项目可用时写入它们,您需要这样的东西:
public async Task ExecuteAsync()
{
var blobContainerClient = new BlobContainerClient("connectionString", "container");
var blobClient = blobContainerClient.GetBlobClient("blobName");
var items = GetItems();
await using var blobStream = await blobClient.OpenWriteAsync(overwrite: true);
await using var writer = new Utf8JsonWriter(blobStream, new JsonWriterOptions { Indented = true });
writer.WriteStartArray();
await foreach (var item in items)
{
JsonSerializer.Serialize(writer, item);
await writer.FlushAsync();
}
writer.WriteEndArray();
await writer.FlushAsync();
}