我如何将其变成一种消除重复的方法(即,不要重复自己),但也不进行新的装箱/对象分配,或者至少尽可能少地分配。
private static IEnumerable<string> SeparateLineIntoMultipleDefinitions(string line) {
string[] splitEntries;
splitEntries = (from str in line.Split(new[] {", "}, StringSplitOptions.RemoveEmptyEntries)
where str.Contains('=')
select str).ToArray();
if (splitEntries.Length > 2) return splitEntries;
splitEntries = (from str in line.Split(',')
where str.Contains('=')
select str).ToArray();
if (splitEntries.Length > 2) return splitEntries;
splitEntries = (from str in line.Split(' ')
where str.Contains('=')
select str).ToArray();
if (splitEntries.Length > 2) return splitEntries;
return Enumerable.Empty<string>();
}
最初我尝试制作这样的方法:
IEnumerable<string> SplitEntries(object splitter) {
return splitter switch {
string[] strArray => (from str in line.Split(strArray, StringSplitOptions.RemoveEmptyEntries)
where str.Contains('=')
select str),
string s => (from str in line.Split(new[] {s}, StringSplitOptions.RemoveEmptyEntries)
where str.Contains('=')
select str),
char charSplitter => (from str in line.Split(charSplitter)
where str.Contains('=')
select str),
_ => Enumerable.Empty<string>()
};
}
但是可惜,用
char
调用它会将 char 装入一个对象中。
对于这个特定的场景,我希望它尝试解析
", "
,然后只是 ','
,然后只是 ' '
。然而,如果我打电话
line.Split(',', ' ')
我相信它会尝试将
this=that,there x=y b=c
之类的东西分成this=that
,there
,x=y
,b=c
。但我不想要那个额外的条目there
。 (我的初衷是通过将其包含在之前的分割中来保留there
,即this=that,there
,但由于我不清楚,答案没有提供实现这一点的解决方案,而是删除条目there
。因此,我将按原样保留该问题)。
我会使用正则表达式,但我认为不使用它会提供更多内存优化的解决方案(可能是错误的前提;我很高兴被证明是错误的)。尽管可能被认为是过度设计,但我很好奇解决方案是什么,因为我有一种感觉,我错过了一些非常简单的东西。
使用
ReadOnlySpan<char>
进行拆分的替代方法,而不使用 Regex
、string.Split()
和 System.Linq
:
代码(UsingSpan):
public static IEnumerable<string> SeparateLineIntoMultipleDefinitions(ReadOnlySpan<char> line)
{
List<string> definitions = new List<string>();
bool captureIsStarted = false;
int equalSignCount = 0;
int lastEqualSignPosition = 0;
int captureStart = 0;
int captureEnd = 0;
for (int i = 0; i < line.Length; i++)
{
char c = line[i];
if (c != ',' && !char.IsWhiteSpace(c))
{
if (captureIsStarted)
{
captureEnd = i;
}
else
{
captureStart = i;
captureIsStarted = true;
}
if (c == '=')
{
equalSignCount++;
lastEqualSignPosition = i;
}
}
else
{
if (equalSignCount == 1 && lastEqualSignPosition > captureStart && lastEqualSignPosition < captureEnd)
{
definitions.Add(line[captureStart..(captureEnd + 1)].ToString());
}
equalSignCount = 0;
captureIsStarted = false;
}
}
if (captureIsStarted && equalSignCount == 1 && lastEqualSignPosition > captureStart && lastEqualSignPosition < captureEnd)
{
definitions.Add(line[captureStart..(captureEnd + 1)].ToString());
}
return definitions;
}
代码(UsingSpan_ZeroAllocation):
public static (int, (int, int)[]) SeparateLineIntoMultipleDefinitions_ZeroAllocation(ReadOnlySpan<char> line)
{
int count = 0;
(int, int)[] ranges = ArrayPool<(int, int)>.Shared.Rent(line.Length);
bool captureIsStarted = false;
int equalSignCount = 0;
int lastEqualSignPosition = 0;
int captureStart = 0;
int captureEnd = 0;
for (int i = 0; i < line.Length; i++)
{
char c = line[i];
if (c != ',' && !char.IsWhiteSpace(c))
{
if (captureIsStarted)
{
captureEnd = i;
}
else
{
captureStart = i;
captureIsStarted = true;
}
if (c == '=')
{
equalSignCount++;
lastEqualSignPosition = i;
}
}
else
{
if (equalSignCount == 1 && lastEqualSignPosition > captureStart && lastEqualSignPosition < captureEnd)
{
ranges[count++] = (captureStart, captureEnd + 1);
}
equalSignCount = 0;
captureIsStarted = false;
}
}
if (captureIsStarted && equalSignCount == 1 && lastEqualSignPosition > captureStart && lastEqualSignPosition < captureEnd)
{
ranges[count++] = (captureStart, captureEnd + 1);
}
return (count, ranges);
}
用法示例:
var line = "this=that,there x=y b=c";
var (count, ranges) = SeparateLineIntoMultipleDefinitions_ZeroAllocation(line);
for (int i = 0; i < count; i++)
{
var (offset, length) = ranges[i];
Console.WriteLine(line[offset..length]);
}
ArrayPool<(int, int)>.Shared.Return(ranges);
基准:
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.17763.1039 (1809/October2018Update/Redstone5)
Intel Xeon CPU E5-2696 v4 2.20GHz, 2 CPU, 88 logical and 88 physical cores
.NET Core SDK=3.1.101
[Host] : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------- |---------:|--------:|--------:|-------:|------:|------:|----------:|
| UsingSpan_ZeroAllocation | 139.9 ns | 0.86 ns | 0.76 ns | - | - | - | - |
| UsingSpan | 176.3 ns | 1.66 ns | 1.47 ns | 0.0067 | - | - | 192 B |
| UsingRegEx | 218.2 ns | 2.62 ns | 2.45 ns | 0.0088 | - | - | 256 B |
| UsingLinq | 339.0 ns | 3.86 ns | 3.42 ns | 0.0100 | - | - | 288 B |
| UsingOPMethod | 853.0 ns | 8.80 ns | 8.23 ns | 0.0210 | - | - | 624 B |
Split(String[], StringSplitOptions)
重载:
var line = "this=that,there x=y b=c";
var spl = new[] { ", ", ",", " " };
var result = (from str in line.Split(spl, StringSplitOptions.RemoveEmptyEntries)
where str.Contains('=')
select str);
输出:
this=that
x=y
b=c