用多个分隔符分割字符串,而不创建新的对象分配

问题描述 投票:0回答:3

我如何将其变成一种消除重复的方法(即,不要重复自己),但也不进行新的装箱/对象分配,或者至少尽可能少地分配。

private static IEnumerable<string> SeparateLineIntoMultipleDefinitions(string line) {
    string[] splitEntries;
    splitEntries = (from str in line.Split(new[] {", "}, StringSplitOptions.RemoveEmptyEntries)
                    where str.Contains('=')
                    select str).ToArray();
    if (splitEntries.Length > 2) return splitEntries;
    splitEntries = (from str in line.Split(',')
                    where str.Contains('=')
                    select str).ToArray();
    if (splitEntries.Length > 2) return splitEntries;
    splitEntries = (from str in line.Split(' ')
                    where str.Contains('=')
                    select str).ToArray();
    if (splitEntries.Length > 2) return splitEntries;
    return Enumerable.Empty<string>();
}

最初我尝试制作这样的方法:

IEnumerable<string> SplitEntries(object splitter) {
    return splitter switch {
        string[] strArray => (from str in line.Split(strArray, StringSplitOptions.RemoveEmptyEntries)
                  where str.Contains('=')
                  select str),
        string s => (from str in line.Split(new[] {s}, StringSplitOptions.RemoveEmptyEntries)
                  where str.Contains('=')
                  select str),
        char charSplitter => (from str in line.Split(charSplitter)
                  where str.Contains('=')
                  select str),
        _ => Enumerable.Empty<string>()
    };
}

但是可惜,用

char
调用它会将 char 装入一个对象中。

对于这个特定的场景,我希望它尝试解析

", "
,然后只是
','
,然后只是
' '
。然而,如果我打电话

line.Split(',', ' ')

我相信它会尝试将

this=that,there x=y b=c
之类的东西分成
this=that
there
x=y
b=c
。但我不想要那个额外的条目
there
。 (我的初衷是通过将其包含在之前的分割中来保留
there
,即
this=that,there
,但由于我不清楚,答案没有提供实现这一点的解决方案,而是删除条目
there
。因此,我将按原样保留该问题)。

我会使用正则表达式,但我认为不使用它会提供更多内存优化的解决方案(可能是错误的前提;我很高兴被证明是错误的)。尽管可能被认为是过度设计,但我很好奇解决方案是什么,因为我有一种感觉,我错过了一些非常简单的东西。

c# string split
3个回答
5
投票

使用

ReadOnlySpan<char>
进行拆分的替代方法,而不使用
Regex
string.Split()
System.Linq


代码(UsingSpan):

public static IEnumerable<string> SeparateLineIntoMultipleDefinitions(ReadOnlySpan<char> line)
{
    List<string> definitions = new List<string>();

    bool captureIsStarted = false;

    int equalSignCount = 0;
    int lastEqualSignPosition = 0;
    int captureStart = 0;
    int captureEnd = 0;

    for (int i = 0; i < line.Length; i++)
    {
        char c = line[i];

        if (c != ',' && !char.IsWhiteSpace(c))
        {
            if (captureIsStarted)
            {
                captureEnd = i;
            }

            else
            {
                captureStart = i;
                captureIsStarted = true;
            }

            if (c == '=')
            {
                equalSignCount++;
                lastEqualSignPosition = i;
            }
        }
        else
        {
            if (equalSignCount == 1 && lastEqualSignPosition > captureStart && lastEqualSignPosition < captureEnd)
            {
                definitions.Add(line[captureStart..(captureEnd + 1)].ToString());
            }

            equalSignCount = 0;
            captureIsStarted = false;
        }
    }

    if (captureIsStarted && equalSignCount == 1 && lastEqualSignPosition > captureStart && lastEqualSignPosition < captureEnd)
    {
        definitions.Add(line[captureStart..(captureEnd + 1)].ToString());
    }  

    return definitions;
}

代码(UsingSpan_ZeroAllocation):

public static (int, (int, int)[]) SeparateLineIntoMultipleDefinitions_ZeroAllocation(ReadOnlySpan<char> line)
{
    int count = 0;
    (int, int)[] ranges = ArrayPool<(int, int)>.Shared.Rent(line.Length);

    bool captureIsStarted = false;

    int equalSignCount = 0;
    int lastEqualSignPosition = 0;
    int captureStart = 0;
    int captureEnd = 0;

    for (int i = 0; i < line.Length; i++)
    {
        char c = line[i];

        if (c != ',' && !char.IsWhiteSpace(c))
        {
            if (captureIsStarted)
            {
                captureEnd = i;
            }

            else
            {
                captureStart = i;
                captureIsStarted = true;
            }

            if (c == '=')
            {
                equalSignCount++;
                lastEqualSignPosition = i;
            }
        }
        else
        {
            if (equalSignCount == 1 && lastEqualSignPosition > captureStart && lastEqualSignPosition < captureEnd)
            {
                ranges[count++] = (captureStart, captureEnd + 1);
            }

            equalSignCount = 0;
            captureIsStarted = false;
        }
    }

    if (captureIsStarted && equalSignCount == 1 && lastEqualSignPosition > captureStart && lastEqualSignPosition < captureEnd)
    {
        ranges[count++] = (captureStart, captureEnd + 1);
    }

    return (count, ranges);
}

用法示例:

var line = "this=that,there x=y b=c";

var (count, ranges) = SeparateLineIntoMultipleDefinitions_ZeroAllocation(line);

for (int i = 0; i < count; i++)
{
    var (offset, length) = ranges[i];

    Console.WriteLine(line[offset..length]);
}

ArrayPool<(int, int)>.Shared.Return(ranges);

基准:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.17763.1039 (1809/October2018Update/Redstone5)
Intel Xeon CPU E5-2696 v4 2.20GHz, 2 CPU, 88 logical and 88 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT


|                   Method |     Mean |   Error |  StdDev |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------- |---------:|--------:|--------:|-------:|------:|------:|----------:|
| UsingSpan_ZeroAllocation | 139.9 ns | 0.86 ns | 0.76 ns |      - |     - |     - |         - |
|                UsingSpan | 176.3 ns | 1.66 ns | 1.47 ns | 0.0067 |     - |     - |     192 B |
|               UsingRegEx | 218.2 ns | 2.62 ns | 2.45 ns | 0.0088 |     - |     - |     256 B |
|                UsingLinq | 339.0 ns | 3.86 ns | 3.42 ns | 0.0100 |     - |     - |     288 B |
|            UsingOPMethod | 853.0 ns | 8.80 ns | 8.23 ns | 0.0210 |     - |     - |     624 B |

2
投票

只需使用

Split(String[], StringSplitOptions)
重载:

var line = "this=that,there x=y b=c";
var spl = new[] { ", ", ",", " " };
var result = (from str in line.Split(spl, StringSplitOptions.RemoveEmptyEntries)
                  where str.Contains('=')
                  select str);

输出:

this=that 
x=y 
b=c 

1
投票

可以使用正则表达式的编译版本。

Regex.Matches(strings,"[^=, ]*=[^=, ]*",RegexOptions.Compiled)
     .Cast<Match>().Select(x=>x.Value)

这是 Regex 与 OP Method 的基准测试结果。

© www.soinside.com 2019 - 2024. All rights reserved.