首先执行两个等效的LINQ查询总是运行得慢

问题描述 投票:0回答:2

考虑以下两种编写此LINQ查询的方法:

选项1:

public void MyMethod(List<MyObject> myList)
{
   ...
   var isValid = myList.Where(l => l.IsActive)
                       .GroupBy(l => l.Category)
                       .Select(g => g.Count() > 300) //arbitrary number for the sake of argument
                       .Any();
}

选项2:

public void MyMethod(List<MyObject> myList)
{
   ...
   var isValid = myList.Where(l => l.IsActive)
                       .GroupBy(l => l.Category)
                       .Select(g => g.Count()) 
                       .Any(total => total > 300); //arbitrary number for the sake of argument
}

我想看看两者之间的性能是否有任何差异所以我创建了一个控制台应用程序(如下所示)来比较它们。

发生的事情是,首先执行的查询总是运行得慢,然后在后续运行中它们都显示为0毫秒运行。然后我将比较值更改为Ticks并得到了类似的结果。如果我切换执行查询的顺序,新的第一个现在运行得更慢。

所以问题是双重的,为什么第一个执行的查询似乎是较慢的?而且,有没有一种方法可以实际比较两者的表现?

这是测试代码:

static void Main(string[] args)
{
    Console.WriteLine("Running test");

    var rnd = new Random();

    for (var i = 0;i < 5; i++)
    {
        RunTest(i, rnd);
        Console.WriteLine();
        Console.WriteLine();
    }

    Console.ReadKey();
}

private static void RunTest(int runId, Random rnd)
{
    var list = GetData(rnd);

    var startOne = DateTime.Now.TimeOfDay;

    var one = list.Where(l => l.IsActive)
        .GroupBy(l => l.Category)
        .Select(g => g.Count() > 300)
        .Any();

    var endOne = DateTime.Now.TimeOfDay;

    var startTwo = DateTime.Now.TimeOfDay;

    var two = list.Where(l => l.IsActive)
        .GroupBy(l => l.Category)
        .Select(g => g.Count())
        .Any(c => c > 300);

    var endTwo = DateTime.Now.TimeOfDay;

    var resultOne = (endOne - startOne).Milliseconds;
    var resultTwo = (endTwo - startTwo).Milliseconds;

    Console.WriteLine($"Results for test run #{++runId}");
    Console.WriteLine();

    Console.WriteLine($"Category 1 total: {list.Where(l => l.Category == 1 && l.IsActive).Count()}");
    Console.WriteLine($"Category 2 total: {list.Where(l => l.Category == 2 && l.IsActive).Count()}");
    Console.WriteLine($"Category 3 total: {list.Where(l => l.Category == 3 && l.IsActive).Count()}");
    Console.WriteLine();

    Console.WriteLine($"First option runs in: {resultOne} ");
    Console.WriteLine();
    Console.WriteLine($"Second option runs in: {resultTwo} ");
}

    private static List<MyObject> GetData(Random rnd)
    {
        var result = new List<MyObject>();

        for (var i = 0; i < 1000; i++)
        {                
            result.Add(new MyObject { Category = rnd.Next(1, 4), IsActive = rnd.Next(0, 2) != 0 });
        }

        return result;
    }
}

    public class MyObject
    {
        public bool IsActive { get; set; }
        public int Category { get; set; }
    }
c# performance linq benchmarking
2个回答
2
投票

您的基准测试方法存在几个问题。

首先,当你有两个DateTime值并且你用它们的TimeOfDay属性进行比较时......

var startOne = DateTime.Now.TimeOfDay;
// Do some work
var endOne = DateTime.Now.TimeOfDay;
var resultOne = (endOne - startOne).Milliseconds;

...如果测试要跨越一天的转换(午夜),那么你就有可能获得负面持续时间。考虑一下......

DateTime midnight = DateTime.Today;
DateTime fiveSecondsBeforeMidnight = midnight - TimeSpan.FromSeconds(5);
DateTime fiveSecondsAfterMidnight  = midnight + TimeSpan.FromSeconds(5);

Console.WriteLine($"Difference between DateTime  values: {fiveSecondsAfterMidnight - fiveSecondsBeforeMidnight}");
Console.WriteLine($"Difference between TimeOfDay values: {fiveSecondsAfterMidnight.TimeOfDay - fiveSecondsBeforeMidnight.TimeOfDay}");

...打印...

Difference between DateTime  values: 00:00:10
Difference between TimeOfDay values: -23:59:50

相反,您可以通过直接比较DateTime值来修复此错误并简化代码...

var startOne = DateTime.Now;
// Do some work
var endOne = DateTime.Now;
var resultOne = (endOne - startOne).Milliseconds;

然而,这可以通过使用Stopwatch class进一步改进,DateTime比比较Stopwatch stopwatch = Stopwatch.StartNew(); // Do some work TimeSpan resultOne = stopwatch.Elapsed; stopwatch.Restart(); // Do some work TimeSpan resultTwo = stopwatch.Elapsed; 值并且专门为此目的设计更准确...

TimeSpan.Milliseconds property

其次,TimeSpan仅返回TimeSpan值的毫秒分量。要以毫秒为单位获取TotalMilliseconds property值,您需要TimeSpan value1 = TimeSpan.FromSeconds(1) + TimeSpan.FromMilliseconds(500); TimeSpan value2 = TimeSpan.FromMilliseconds(900); Console.WriteLine($" value1.Milliseconds: {value1.Milliseconds}"); Console.WriteLine($"value1.TotalMilliseconds: {value1.TotalMilliseconds}"); Console.WriteLine($" value2.Milliseconds: {value2.Milliseconds}"); Console.WriteLine($"value2.TotalMilliseconds: {value2.TotalMilliseconds}"); Console.WriteLine($"value1 is {(value1.Milliseconds < value2.Milliseconds ? "less" : "greater")} than value2 (by Milliseconds)"); Console.WriteLine($"value1 is {(value1.TotalMilliseconds < value2.TotalMilliseconds ? "less" : "greater")} than value2 (by TotalMilliseconds)"); 。考虑一下这里的区别......

     value1.Milliseconds: 500
value1.TotalMilliseconds: 1500
     value2.Milliseconds: 900
value2.TotalMilliseconds: 900
value1 is less than value2 (by Milliseconds)
value1 is greater than value2 (by TotalMilliseconds)

...打印...

Ticks

像你一样比较TimeSpan属性,可能是另一种解决方法,或者你可以将时差存储为TimeSpan resultOne = endOne - startOne; TimeSpan resultTwo = endTwo - startTwo; // ... Console.WriteLine($"First option runs in: {resultOne:s\\.ffffff} seconds"); Console.WriteLine(); Console.WriteLine($"Second option runs in: {resultTwo:s\\.ffffff} seconds"); 而不选择其中一个属性,并让字符串格式化处理较小的组件......

List<>

最后,我运行了你的代码并看到了你所做的相同结果:第一次运行非零,后续运行为零。我的猜测是,第一次运行需要更长时间,因为您的代码尚未进行JIT优化。即使那些“慢”的第一次运行也只需要几毫秒才能完成,因为你的列表只有一千个项目。那些简短的基准测试并不能提供有意义的比较。

在进行上述更改并将GetData()返回的BenchmarkDotNet的大小增加到1000万个项目后,每次运行需要几秒钟,第一个选项在第一次运行时快几毫秒,在后续运行中慢25-125毫秒。

您可以考虑使用像BenchmarkDotNet这样的库,而不是滚动自己的基准代码。它处理细节,例如计算要执行的运行次数,“预热”代码以确保它已经过优化,以及为您计算统计数据。


4
投票

是的,您可以使用void Main() { var summary = BenchmarkRunner.Run<CollectionBenchmark>(); } [MemoryDiagnoser] public class CollectionBenchmark { private static Random random = new Random(); private List<MyObject> _list = new List<MyObject>(); [GlobalSetup] public void GlobalSetup() { var rnd = new Random(); for (var i = 0; i < 1000; i++) { _list.Add(new MyObject { Category = rnd.Next(1, 4), IsActive = rnd.Next(0, 2) != 0 }); } } [Benchmark] public void OptionOne() { var one = _list.Where(l => l.IsActive) .GroupBy(l => l.Category) .Select(g => g.Count() > 300) .Any(); } [Benchmark] public void OptionTwo() { var two = _list.Where(l => l.IsActive) .GroupBy(l => l.Category) .Select(g => g.Count()) .Any(c => c > 300); } } public class MyObject { public bool IsActive { get; set; } public int Category { get; set; } } 准确地比较两个选项的性能。这成为一个简单的设置测试脚本。

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i5-6300U CPU 2.40GHz (Skylake), 1 CPU, 4 logical and 2 physical cores
Frequency=2437498 Hz, Resolution=410.2567 ns, Timer=TSC
  [Host]     : .NET Framework 4.6.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3324.0
  DefaultJob : .NET Framework 4.6.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3324.0


|    Method |     Mean |     Error |    StdDev |  Gen 0 | Allocated |
|---------- |---------:|----------:|----------:|-------:|----------:|
| OptionOne | 36.73 us | 0.7491 us | 1.9202 us | 8.4839 |  13.13 KB |
| OptionTwo | 36.37 us | 0.6993 us | 0.8053 us | 8.4839 |  13.13 KB |

这在我的机器上产生了以下结果:

qazxswpoi

分配的内存是相同的。考虑到基准测量的时间差在几分之一微秒内,两者的性能没有实际差异。

© www.soinside.com 2019 - 2024. All rights reserved.