不同窗口宽度的滚动平均值之间的成对差异

问题描述 投票:0回答:2

此代码旨在计算一系列窗口宽度的信号的滚动平均值(即平均多少个点),然后计算每个窗口宽度的平均值之间所有成对差异的总和。

import numpy as np

testData = np.random.normal(0,1,10000)

windowWidths = np.arange(1,1000)
sliding_averages = []
diffs = []

for windowWidth in windowWidths:
    # Rolling average using convolution
    sliding_average = np.convolve(testData, np.ones(windowWidth) / windowWidth, mode='valid') 
    
    # All pairwise differences
    pairwiseDiffs = (sliding_average[:,None] - sliding_average[None,:]) 

    # Define mask to extract only one corner of difference matrix
    mask = np.triu(np.ones_like(pairwiseDiffs, dtype=bool), k=1) 
    pairwiseDiffs = pairwiseDiffs[mask]

    pairwiseDiffsSqrd = pairwiseDiffs**2
    diffs.append(np.sum(pairwiseDiffsSqrd))

这旨在成为重现论文本节中描述的计算的一个组件:

计算:

Calculation

我的问题是是否有更有效的方法来运行此计算。

我尝试过进一步向量化并用其他方法替换卷积步骤,但对结果没有信心。

python numpy performance average pairwise
2个回答
0
投票

虽然我无法修复您在分析过程中可能发现的任何其他错误,但我可以回答您关于性能问题的原始问题。


查看您的代码,您似乎对平方差感兴趣,这恰好与方差的数学定义相似。如果我们确实改变变量的方差计算方式,我们可以得到一个公式,其中首先计算平方,然后才计算差值:

现在回顾一下你的代码:

import numpy as np

testData = np.random.normal(0, 1, 10000)

windowWidths = np.arange(1, 1000)
sliding_averages = []
diffs = []

for windowWidth in windowWidths:
    # Rolling average using convolution
    sliding_average = np.convolve(testData, np.ones(windowWidth) / windowWidth,
                                  mode='valid')

    # # All pairwise differences
    # pairwiseDiffs = (sliding_average[:, None] - sliding_average[None, :])
    #
    # # Define mask to extract only one corner of difference matrix
    # mask = np.triu(np.ones_like(pairwiseDiffs, dtype=bool), k=1)
    # pairwiseDiffs = pairwiseDiffs[mask]
    #
    # pairwiseDiffsSqrd = pairwiseDiffs ** 2
    # diffs.append(np.sum(pairwiseDiffsSqrd))

无论我们是先计算差值还是先计算平方,注释掉的部分应该表现相同:

sum_sq = np.sum(sliding_average ** 2)
sum_ = np.sum(sliding_average)
sum_sq_diff = sum_sq - sum_ * sum_
diffs.append(sum_sq_diff)

但是,我们不能忘记总体方差公式将总和除以观测值数量,而我们的代码中仍然完全缺少该公式,因此我们将其添加回来以获得:

import numpy as np

np.random.seed(0)
testData = np.random.normal(0, 1, 10000)
windowWidths = np.arange(1, 1000)

sliding_averages = []
diffs = []

for windowWidth in windowWidths:
    # Rolling average using convolution
    sliding_average = np.convolve(testData, np.ones(windowWidth) / windowWidth,
                                  mode='valid')

    sum_sq = np.sum(sliding_average ** 2)
    sum_ = np.sum(sliding_average)
    sum_sq_diff = len(sliding_average) * sum_sq - sum_ * sum_
    diffs.append(sum_sq_diff)

正如预期的那样,使用设置种子,这会在我的机器上返回相同的差异列表,执行时间为亚秒级。在进行统计计算时,通常可以完全避免屏蔽或矩阵运算,并且几乎总是计算结果的最快方法。


0
投票

当我意识到自己的错误后,这就是我想出的代码。有一些 Python 库可以实现相同的功能(例如 AllanTools),但我想确保它按照我的预期进行操作并加深我的理解。

def allanVariance(data):
    
    windowWidths = np.arange(1,int(len(data)/2), 2) #Can reduce sampling rate to speed up.
    diffs = []
    
    for windowWidth in windowWidths:
        # Rolling average at each window width
        sliding_average = np.convolve(data, np.ones(windowWidth) / windowWidth, mode='valid') 
        
        consecDiffs = sliding_average[windowWidth:] - sliding_average[:-windowWidth] 
        consecDiffs = np.array(consecDiffs)
    
        DiffsSqrd = consecDiffs**2
        diffs.append(np.sum(DiffsSqrd))
       
    diffs = (np.array(diffs))

    N = len(data)
    
    # Allan variance calculation (as expressed in Gattinger et al. Optics Express, Vol. 30, Issue 4, pp. 5222-5254, among others)
    allnVar = np.sqrt((1/(2*(N-(2*windowWidths)-1))) * diffs)
    
    return allnVar 
© www.soinside.com 2019 - 2024. All rights reserved.