我有一个存储在char数组中的XML-char[]
-并且我将数据的内容长度包含在一个int变量中。我需要使用XmlSerializer反序列化数据。
出于性能原因,我需要避免分配字符串对象,因为数据通常大于85kb,并且会生成Gen2对象。
是否有任何方法可以将char []传递给XmlSerializer
而不将其转换为字符串?它接受Stream
或TextReader
,但我找不到从char[]
构造一个的方法。
我正在想象这样的事情(除了C#没有CharArrayStream或CharArrayReader):
public MyEntity DeserializeXmlDocument(char [] buffer, int contentLength) {
using (var stream = new CharArrayStream(buffer, contentLength))
{
return _xmlSerializer.Deserialize(stream) as MyEntity;
}
}
[作为更多信息,我们正在分析现有代码并确定了痛点,所以这不是“过早优化”或“ XY问题”的情况。
我重新设计了@GyörgyKőszeg链接到CharArrayStream类的代码。到目前为止,这在我的测试中仍然有效:
public class CharArrayStream : Stream
{
private readonly char[] str;
private readonly int n;
public override bool CanRead => true;
public override bool CanSeek => true;
public override bool CanWrite => false;
public override long Length => n;
public override long Position { get; set; } // TODO: bounds check
public CharArrayStream(char[] str, int n)
{
this.str = str;
this.n = n;
}
public override long Seek(long offset, SeekOrigin origin)
{
switch (origin)
{
case SeekOrigin.Begin:
Position = offset;
break;
case SeekOrigin.Current:
Position += offset;
break;
case SeekOrigin.End:
Position = Length - offset;
break;
}
return Position;
}
private byte this[int i] => (byte)str[i];
public override int Read(byte[] buffer, int offset, int count)
{
// TODO: bounds check
var len = Math.Min(count, Length - Position);
for (int i = 0; i < len; i++)
{
buffer[offset++] = this[(int)(Position++)];
}
return (int)len;
}
public override int ReadByte() => Position >= Length ? -1 : this[(int)Position++];
public override void Flush() { }
public override void SetLength(long value) => throw new NotSupportedException();
public override void Write(byte[] buffer, int offset, int count) => throw new NotSupportedException();
public override string ToString() => throw new NotSupportedException();
}
我可以通过这种方式使用它:
public MyEntity DeserializeXmlDocument(char [] buffer, int contentLength) {
using (var stream = new CharArrayStream(buffer, contentLength))
{
return _xmlSerializer.Deserialize(stream) as MyEntity;
}
}
谢谢,@GyörgyKőszeg!
子类TextReader
可以很容易地从字符数组或等效数组中读取。这是一个采用ReadOnlyMemory<char>
的版本,该ReadOnlyMemory<char>
可以表示string
或char []
字符数组的一部分:
public sealed class CharMemoryReader : TextReader
{
private ReadOnlyMemory<char> chars;
private int position;
public CharMemoryReader(ReadOnlyMemory<char> chars)
{
this.chars = chars;
this.position = 0;
}
void CheckClosed()
{
if (position < 0)
throw new ObjectDisposedException(null, string.Format("{0} is closed.", ToString()));
}
public override void Close() => Dispose(true);
protected override void Dispose(bool disposing)
{
chars = ReadOnlyMemory<char>.Empty;
position = -1;
base.Dispose(disposing);
}
public override int Peek()
{
CheckClosed();
return position >= chars.Length ? -1 : chars.Span[position];
}
public override int Read()
{
CheckClosed();
return position >= chars.Length ? -1 : chars.Span[position++];
}
public override int Read(char[] buffer, int index, int count)
{
CheckClosed();
if (buffer == null)
throw new ArgumentNullException(nameof(buffer));
if (index < 0)
throw new ArgumentOutOfRangeException(nameof(index));
if (count < 0)
throw new ArgumentOutOfRangeException(nameof(count));
if (buffer.Length - index < count)
throw new ArgumentException("buffer.Length - index < count");
return Read(buffer.AsSpan().Slice(index, count));
}
public override int Read(Span<char> buffer)
{
CheckClosed();
var nRead = chars.Length - position;
if (nRead > 0)
{
if (nRead > buffer.Length)
nRead = buffer.Length;
chars.Span.Slice(position, nRead).CopyTo(buffer);
position += nRead;
}
return nRead;
}
public override string ReadToEnd()
{
CheckClosed();
var s = position == 0 ? chars.ToString() : chars.Slice(position, chars.Length - position).ToString();
position = chars.Length;
return s;
}
public override string ReadLine()
{
CheckClosed();
var span = chars.Span;
var i = position;
for( ; i < span.Length; i++)
{
var ch = span[i];
if (ch == '\r' || ch == '\n')
{
var result = span.Slice(position, i - position).ToString();
position = i + 1;
if (ch == '\r' && position < span.Length && span[position] == '\n')
position++;
return result;
}
}
if (i > position)
{
var result = span.Slice(position, i - position).ToString();
position = i;
return result;
}
return null;
}
public override int ReadBlock(char[] buffer, int index, int count) => Read(buffer, index, count);
public override int ReadBlock(Span<char> buffer) => Read(buffer);
public override Task<String> ReadLineAsync() => Task.FromResult(ReadLine());
public override Task<String> ReadToEndAsync() => Task.FromResult(ReadToEnd());
public override Task<int> ReadBlockAsync(char[] buffer, int index, int count) => Task.FromResult(ReadBlock(buffer, index, count));
public override Task<int> ReadAsync(char[] buffer, int index, int count) => Task.FromResult(Read(buffer, index, count));
public override ValueTask<int> ReadBlockAsync(Memory<char> buffer, CancellationToken cancellationToken = default) =>
cancellationToken.IsCancellationRequested ? new ValueTask<int>(Task.FromCanceled<int>(cancellationToken)) : new ValueTask<int>(ReadBlock(buffer.Span));
public override ValueTask<int> ReadAsync(Memory<char> buffer, CancellationToken cancellationToken = default) =>
cancellationToken.IsCancellationRequested ? new ValueTask<int>(Task.FromCanceled<int>(cancellationToken)) : new ValueTask<int>(Read(buffer.Span));
}
然后将其与下列扩展方法之一一起使用:
public static partial class XmlSerializationHelper
{
public static T LoadFromXml<T>(this char [] xml, int contentLength, XmlSerializer serial = null) =>
new ReadOnlyMemory<char>(xml, 0, contentLength).LoadFromXml<T>(serial);
public static T LoadFromXml<T>(this ReadOnlyMemory<char> xml, XmlSerializer serial = null)
{
serial = serial ?? new XmlSerializer(typeof(T));
using (var reader = new CharMemoryReader(xml))
return (T)serial.Deserialize(reader);
}
}
例如
var result = buffer.LoadFromXml<MyEntity>(contentLength, _xmlSerializer);
注意:
char []
字符数组的内容基本上与没有BOM的UTF-16编码内存流相同,因此可以创建类似于Stream
的自定义MemoryStream
实现,该实现将每个MemoryStream
表示为char
至this answer由How do I generate a stream from a string?执行的两个字节。完全正确地执行此操作似乎有些棘手,因为正确使用所有György Kőszeg方法似乎并不容易。
已经这样做async
仍然需要用XmlReader
包装自定义流,该StreamReader
会将流“解码”为字符序列,正确地推断出该过程中的编码(我观察到的情况有时可能做错了,例如,当编码说明XML声明与实际编码不匹配时。
我选择创建自定义TextReader
而不是自定义Stream
,以避免不必要的解码步骤,并且因为async
实现似乎不太麻烦。
通过截断将每个char
表示为单个字节(例如(byte)str[i]
)将破坏包含任何多字节字符的XML。
我尚未对上述实现进行任何性能调整。
演示小提琴here。