提高移动平均线计算的性能。 IEnumerable 与 List、foreach 与 for、ElementAt、

问题描述 投票:0回答:1

我创建了一个类来计算指数移动平均线:

public class ExponentialMovingAverage {

  public Int32 Period { get; set; }

  public ExponentialMovingAverage(Int32 period = 20) {

    ArgumentOutOfRangeException.ThrowIfNegativeOrZero(period);

    Period = period;
    
  } 

  public override IEnumerable<(DateTimeOffset Stamp, Decimal? ExponentialMovingAverage)> Compute(IEnumerable<(DateTimeOffset Stamp, Decimal? Value)> inputs) {
    
    ArgumentNullException.ThrowIfNull(inputs);

    inputs = inputs.OrderBy(x => x.Stamp);

    Decimal? previous = null;

    Decimal factor = (Decimal)(2d / (Period + 1));

    Decimal? sum = 0;

    Int32 notNulls = 0;

    for (Int32 index = 0; index < inputs.Count(); index++) {

      (DateTimeOffset stamp, Decimal? value) = inputs.ElementAt(index);

      if (value == null) {
        notNulls++;
        yield return (stamp, null);   
        continue;  
      }

      if (index < notNulls + Period - 1) {
        sum += value;
        yield return (stamp, null);   
        continue;
      }
        
      if (index == notNulls + Period - 1) {
        sum += value;
        Decimal? sma = sum / Period;
        previous = sma;
        yield return (stamp, sma);  
        continue;  
      }

      Decimal? ema = previous + (factor * (value - previous));
      previous = ema;
      yield return (stamp, ema);

    } 
    
  } 

} 

我通过了以下测试:

[Fact]
public void Test_AllNonNullInputs() {
    var ema = new ExponentialMovingAverage(3);

    var inputs = new List<(DateTimeOffset Stamp, decimal? Value)> {
        (new DateTimeOffset(2024, 1, 1, 0, 0, 0, TimeSpan.Zero), 10),
        (new DateTimeOffset(2024, 1, 2, 0, 0, 0, TimeSpan.Zero), 15),
        (new DateTimeOffset(2024, 1, 3, 0, 0, 0, TimeSpan.Zero), 20),
        (new DateTimeOffset(2024, 1, 4, 0, 0, 0, TimeSpan.Zero), 25),
        (new DateTimeOffset(2024, 1, 5, 0, 0, 0, TimeSpan.Zero), 30),
        (new DateTimeOffset(2024, 1, 6, 0, 0, 0, TimeSpan.Zero), 35)
    };
    var output = ema.Compute(inputs).ToList();

    Assert.Equal(6, output.Count);
    Assert.Null(output[0].ExponentialMovingAverage);
    Assert.Null(output[1].ExponentialMovingAverage);
    Assert.Equal(15m, output[2].ExponentialMovingAverage);
    Assert.Equal(20m, output[3].ExponentialMovingAverage);
    Assert.Equal(25m, output[4].ExponentialMovingAverage);
    Assert.Equal(30m, output[5].ExponentialMovingAverage);
}

[Fact]
public void Test_FirstTwoInputsAreNull() {
    var ema = new ExponentialMovingAverage(3);

    var inputs = new List<(DateTimeOffset Stamp, decimal? Value)> {
        (new DateTimeOffset(2024, 1, 1, 0, 0, 0, TimeSpan.Zero), null),
        (new DateTimeOffset(2024, 1, 2, 0, 0, 0, TimeSpan.Zero), null),
        (new DateTimeOffset(2024, 1, 3, 0, 0, 0, TimeSpan.Zero), 20),
        (new DateTimeOffset(2024, 1, 4, 0, 0, 0, TimeSpan.Zero), 25),
        (new DateTimeOffset(2024, 1, 5, 0, 0, 0, TimeSpan.Zero), 30),
        (new DateTimeOffset(2024, 1, 6, 0, 0, 0, TimeSpan.Zero), 35)
    };
    var output = ema.Compute(inputs).ToList();

    Assert.Equal(6, output.Count);
    Assert.Null(output[0].ExponentialMovingAverage);
    Assert.Null(output[1].ExponentialMovingAverage);
    Assert.Null(output[2].ExponentialMovingAverage);
    Assert.Null(output[3].ExponentialMovingAverage);
    Assert.Equal(25m, output[4].ExponentialMovingAverage);
    Assert.Equal(30m, output[5].ExponentialMovingAverage);
}

计算似乎很慢,例如,在计算具有多个输入的 EMA 的 EMA 时:

var ema1 = new ExponentialMovingAverage(3);
var outputs1 = ema1.Compute(inputs);
var ema2 = new ExponentialMovingAverage(3);
var outputs2 = ema2.Compute(ema1);

我一直在研究

inputs.ElementAt(index)
的使用以及 foreach 与 for 以及 List 与 Enumerable 的使用。

如何改进代码,包括其性能?

c#
1个回答
0
投票

您的基本问题是,由于

OrderBy()
返回
IOrderedEnumerable<TSource>
而不是
IList<TSource>
,因此
ElementAt(index)
将流过整个可枚举以获取指定元素。这反过来会导致每次调用 ElementAt() 时都会
重新评估
——这是相当大的 n 方性能损失。

为了防止这种情况发生,请将

for
循环替换为
foreach
循环,否则请确保永远不要多次枚举已排序的枚举器:

public class ExponentialMovingAverage {

    public int Period { get; }

    public ExponentialMovingAverage(int period = 20) {
        ArgumentOutOfRangeException.ThrowIfNegativeOrZero(period);
        Period = period;
    } 

    public /*override*/ IEnumerable<(DateTimeOffset Stamp, decimal? ExponentialMovingAverage)> Compute(IEnumerable<(DateTimeOffset Stamp, Decimal? Value)> inputs) {
        ArgumentNullException.ThrowIfNull(inputs);

        inputs = inputs.OrderBy(x => x.Stamp); // OrderBy 

        decimal? previous = null;
        decimal factor = (Decimal)(2d / (Period + 1));
        decimal? sum = 0;
        int notNulls = 0;

        int index = 0;
        foreach ((var stamp, var value) in inputs) 
        {
            if (value == null) {
                notNulls++;
                yield return (stamp, null);   
            }
            else if (index < notNulls + Period - 1) {
                sum += value;
                yield return (stamp, null);   
            }
            else if (index == notNulls + Period - 1) {
                sum += value;
                Decimal? sma = sum / Period;
                previous = sma;
                yield return (stamp, sma);  
            }
            else {
                Decimal? ema = previous + (factor * (value - previous));
                previous = ema;
                yield return (stamp, ema);
            }
            index++;
        } 
    } 
} 

顺便说一句,我建议您将

Period
属性设置为只读。

演示小提琴#1 这里

或者,如果您出于某种原因需要随机访问输入的排序列表,请将其具体化为

List<T>
并使用它:

public /*override*/ IEnumerable<(DateTimeOffset Stamp, decimal? ExponentialMovingAverage)> Compute(IEnumerable<(DateTimeOffset Stamp, Decimal? Value)> inputs) {
    ArgumentNullException.ThrowIfNull(inputs);

    var inputList = inputs.OrderBy(x => x.Stamp).ToList();

    decimal? previous = null;
    decimal factor = (Decimal)(2d / (Period + 1));
    decimal? sum = 0;
    int notNulls = 0;

    for (int index = 0; index < inputList.Count; index++) {
        (var stamp, var value) = inputList[index];

        if (value == null) {
            notNulls++;
            yield return (stamp, null);   
        }
        else if (index < notNulls + Period - 1) {
            sum += value;
            yield return (stamp, null);   
        }
        else if (index == notNulls + Period - 1) {
            sum += value;
            Decimal? sma = sum / Period;
            previous = sma;
            yield return (stamp, sma);  
        }
        else {
            Decimal? ema = previous + (factor * (value - previous));
            previous = ema;
            yield return (stamp, ema);
        }
    } 
} 

演示小提琴 #2 这里

无论哪种方式,都不应该使用

Count()
ElementAt()
,因为它们不具有性能,除非可枚举实际上是
IList<T>
——在这种情况下,应该使用列表的
Count
Item[int index]
属性相反。

顺便说一句,虽然 OrderBy()

docs
似乎没有明确说明每次都会对其进行评估,但在 按执行方式对标准查询运算符进行分类:Deferred:

中提到了它

延迟执行意味着操作不会在代码中声明查询的位置执行。仅当枚举查询变量时才执行该操作,例如通过使用 foreach 语句。执行查询的结果取决于执行查询时数据源的内容,而不是定义查询时的内容。如果多次枚举查询变量,则每次结果可能会有所不同。几乎所有返回类型为

IEnumerable<T>
IOrderedEnumerable<TElement>
的标准查询运算符都以延迟方式执行。

OrderBy()
使用延迟非流执行,如分类表中所述。

或者,您可以通过跟踪

OrderBy()
调用比较方法的频率来轻松地进行一些调试来确认这一点:

public /*override*/ IEnumerable<(DateTimeOffset Stamp, Decimal? ExponentialMovingAverage)> Compute(IEnumerable<(DateTimeOffset Stamp, Decimal? Value)> inputs) {

    ArgumentNullException.ThrowIfNull(inputs);

    int callCount = 0;
    inputs = inputs.OrderBy(x => { callCount++; return x.Stamp; });

    inputs.ElementAt(0);

    var firstCallCount = callCount;

    inputs.ElementAt(0);

    Assert.Equal(firstCallCount, callCount); // FAILS with Xunit.Sdk.EqualException: Assert.Equal() Failure: Values differ

上面的代码会抛出异常,因为

ElementAt(0)
最终会对每个调用的输入进行重新排序。

演示小提琴 #3 这里

© www.soinside.com 2019 - 2024. All rights reserved.