将char []缓冲区传递给XmlSerializer

问题描述 投票:2回答:2

我有一个存储在char数组中的XML-char[]-并且我将数据的内容长度包含在一个int变量中。我需要使用XmlSerializer反序列化数据。

出于性能原因,我需要避免分配字符串对象,因为数据通常大于85kb,并且会生成Gen2对象。

是否有任何方法可以将char []传递给XmlSerializer而不将其转换为字符串?它接受StreamTextReader,但我找不到从char[]构造一个的方法。

我正在想象这样的事情(除了C#没有CharArrayStream或CharArrayReader):

public MyEntity DeserializeXmlDocument(char [] buffer, int contentLength) {
    using (var stream = new CharArrayStream(buffer, contentLength))
    {
        return _xmlSerializer.Deserialize(stream) as MyEntity;
    }
}

[作为更多信息,我们正在分析现有代码并确定了痛点,所以这不是“过早优化”或“ XY问题”的情况。

c# xmlserializer
2个回答
0
投票

我重新设计了@GyörgyKőszeg链接到CharArrayStream类的代码。到目前为止,这在我的测试中仍然有效:

public class CharArrayStream : Stream
{
    private readonly char[] str;
    private readonly int n;

    public override bool CanRead => true;
    public override bool CanSeek => true;
    public override bool CanWrite => false;
    public override long Length => n;
    public override long Position { get; set; } // TODO: bounds check

    public CharArrayStream(char[] str, int n)
    {
        this.str = str;
        this.n = n;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        switch (origin)
        {
            case SeekOrigin.Begin:
                Position = offset;
                break;
            case SeekOrigin.Current:
                Position += offset;
                break;
            case SeekOrigin.End:
                Position = Length - offset;
                break;
        }

        return Position;
    }

    private byte this[int i] => (byte)str[i];

    public override int Read(byte[] buffer, int offset, int count)
    {
        // TODO: bounds check
        var len = Math.Min(count, Length - Position);
        for (int i = 0; i < len; i++)
        {
            buffer[offset++] = this[(int)(Position++)];
        }
        return (int)len;
    }

    public override int ReadByte() => Position >= Length ? -1 : this[(int)Position++];
    public override void Flush() { }
    public override void SetLength(long value) => throw new NotSupportedException();
    public override void Write(byte[] buffer, int offset, int count) => throw new NotSupportedException();
    public override string ToString() => throw new NotSupportedException();
}

我可以通过这种方式使用它:

public MyEntity DeserializeXmlDocument(char [] buffer, int contentLength) {
    using (var stream = new CharArrayStream(buffer, contentLength))
    {
        return _xmlSerializer.Deserialize(stream) as MyEntity;
    }
}

谢谢,@GyörgyKőszeg!


0
投票

子类TextReader可以很容易地从字符数组或等效数组中读取。这是一个采用ReadOnlyMemory<char>的版本,该ReadOnlyMemory<char>可以表示stringchar []字符数组的一部分:

public sealed class CharMemoryReader : TextReader
{
    private ReadOnlyMemory<char> chars;
    private int position;

    public CharMemoryReader(ReadOnlyMemory<char> chars)
    {
        this.chars = chars;
        this.position = 0;
    }

    void CheckClosed()
    {
        if (position < 0)
            throw new ObjectDisposedException(null, string.Format("{0} is closed.", ToString()));
    }

    public override void Close() => Dispose(true);

    protected override void Dispose(bool disposing)
    {
        chars = ReadOnlyMemory<char>.Empty;
        position = -1;
        base.Dispose(disposing);
    }

    public override int Peek()
    {
        CheckClosed();
        return position >= chars.Length ? -1 : chars.Span[position];
    }

    public override int Read()
    {
        CheckClosed();
        return position >= chars.Length ? -1 : chars.Span[position++];
    }

    public override int Read(char[] buffer, int index, int count)
    {
        CheckClosed();
        if (buffer == null)
            throw new ArgumentNullException(nameof(buffer));
        if (index < 0)
            throw new ArgumentOutOfRangeException(nameof(index));
        if (count < 0)
            throw new ArgumentOutOfRangeException(nameof(count));
        if (buffer.Length - index < count)
            throw new ArgumentException("buffer.Length - index < count");

        return Read(buffer.AsSpan().Slice(index, count));
    }

    public override int Read(Span<char> buffer)
    {
        CheckClosed();

        var nRead = chars.Length - position;
        if (nRead > 0)
        {
            if (nRead > buffer.Length)
                nRead = buffer.Length;
            chars.Span.Slice(position, nRead).CopyTo(buffer);
            position += nRead;
        }
        return nRead;
    }

    public override string ReadToEnd()
    {
        CheckClosed();
        var s = position == 0 ? chars.ToString() : chars.Slice(position, chars.Length - position).ToString();
        position = chars.Length;
        return s;
    }

    public override string ReadLine()
    {
        CheckClosed();
        var span = chars.Span;
        var i = position;
        for( ; i < span.Length; i++)
        {
            var ch = span[i];
            if (ch == '\r' || ch == '\n')
            {
                var result = span.Slice(position, i - position).ToString();
                position = i + 1;
                if (ch == '\r' && position < span.Length && span[position] == '\n')
                    position++;
                return result;
            }
        }
        if (i > position)
        {
            var result = span.Slice(position, i - position).ToString();
            position = i;
            return result;
        }
        return null;
    }

    public override int ReadBlock(char[] buffer, int index, int count) => Read(buffer, index, count);
    public override int ReadBlock(Span<char> buffer) => Read(buffer);

    public override Task<String> ReadLineAsync() => Task.FromResult(ReadLine());
    public override Task<String> ReadToEndAsync() => Task.FromResult(ReadToEnd());
    public override Task<int> ReadBlockAsync(char[] buffer, int index, int count) => Task.FromResult(ReadBlock(buffer, index, count));
    public override Task<int> ReadAsync(char[] buffer, int index, int count) => Task.FromResult(Read(buffer, index, count));
    public override ValueTask<int> ReadBlockAsync(Memory<char> buffer, CancellationToken cancellationToken = default) =>
        cancellationToken.IsCancellationRequested ? new ValueTask<int>(Task.FromCanceled<int>(cancellationToken)) : new ValueTask<int>(ReadBlock(buffer.Span));
    public override ValueTask<int> ReadAsync(Memory<char> buffer, CancellationToken cancellationToken = default) =>
        cancellationToken.IsCancellationRequested ? new ValueTask<int>(Task.FromCanceled<int>(cancellationToken)) : new ValueTask<int>(Read(buffer.Span)); 
}

然后将其与下列扩展方法之一一起使用:

public static partial class XmlSerializationHelper
{
    public static T LoadFromXml<T>(this char [] xml, int contentLength, XmlSerializer serial = null) => 
        new ReadOnlyMemory<char>(xml, 0, contentLength).LoadFromXml<T>(serial);

    public static T LoadFromXml<T>(this ReadOnlyMemory<char> xml, XmlSerializer serial = null)
    {
        serial = serial ?? new XmlSerializer(typeof(T));
        using (var reader = new CharMemoryReader(xml))
            return (T)serial.Deserialize(reader);
    }
}

例如

var result = buffer.LoadFromXml<MyEntity>(contentLength, _xmlSerializer);

注意:

  • char []字符数组的内容基本上与没有BOM的UTF-16编码内存流相同,因此可以创建类似于Stream的自定义MemoryStream实现,该实现将每个MemoryStream表示为charthis answerHow do I generate a stream from a string?执行的两个字节。完全正确地执行此操作似乎有些棘手,因为正确使用所有György Kőszeg方法似乎并不容易。

    已经这样做async仍然需要用XmlReader包装自定义流,该StreamReader会将流“解码”为字符序列,正确地推断出该过程中的编码(我观察到的情况有时可能做错了,例如,当编码说明XML声明与实际编码不匹配时。

    我选择创建自定义TextReader而不是自定义Stream,以避免不必要的解码步骤,并且因为async实现似乎不太麻烦。

  • 通过截断将每个char表示为单个字节(例如(byte)str[i])将破坏包含任何多字节字符的XML。

  • 我尚未对上述实现进行任何性能调整。

演示小提琴here

© www.soinside.com 2019 - 2024. All rights reserved.