将小序列与另一个较大序列进行相关以尝试找到匹配索引的最有效方法

问题描述 投票:0回答:1

在Python中,我想获取一个较小的数字序列,并沿着一个非常大的数字序列找到与这个较小的数字序列具有最高相关性的区域。

除了暴力之外,还有其他有效的方法吗?取与较小序列长度相同的较大序列的子集,计算系数,然后递增起始索引,并一遍又一遍地这样做?

numpy 或其他数学包是否有一些功能可以有效地做到这一点?

python algorithm numpy correlation
1个回答
0
投票

我建议使用 来完成任务,例如:

import numba

# https://stackoverflow.com/a/73663164/10035985
@numba.njit
def corr_nb(data1, data2):
    mean1 = data1.mean()
    mean2 = data2.mean()
    std1 = data1.std()
    std2 = data2.std()
    corr = ((data1 * data2).mean() - mean1 * mean2) / (std1 * std2)
    return corr


@numba.njit
def find_corr(small, big):
    n = len(small)

    curr_max_corr = -np.inf
    curr_max_index = -1

    for i in range(0, len(big) - n):
        c = corr_nb(small, big[i : i + n])
        if c > curr_max_corr:
            curr_max_corr = c
            curr_max_index = i

    return curr_max_index

7 个元素的小数组和 500_000 个元素的大数组的基准:

from timeit import timeit

import numba
import numpy as np

np.random.seed(42)

big_arr = np.random.randint(low=-10, high=10, size=500_000, dtype="int8")

# some small_arr that we try to correlate:
small_arr = np.array([1, -1, 2, 3, 4, -5, 6], dtype="int8")


# https://stackoverflow.com/a/73663164/10035985
@numba.njit
def corr_nb(data1, data2):
    mean1 = data1.mean()
    mean2 = data2.mean()
    std1 = data1.std()
    std2 = data2.std()
    corr = ((data1 * data2).mean() - mean1 * mean2) / (std1 * std2)
    return corr


@numba.njit
def find_corr(small, big):
    n = len(small)

    curr_max_corr = -np.inf
    curr_max_index = -1

    for i in range(0, len(big) - n):
        c = corr_nb(small, big[i : i + n])
        if c > curr_max_corr:
            curr_max_corr = c
            curr_max_index = i

    return curr_max_index


i = find_corr(small_arr, big_arr)
print("Index:", i)
print("Small:", small_arr)
print("Big:  ", big_arr[i : i + len(small_arr)])

t = timeit("find_corr(small_arr, big_arr)", number=1, globals=globals())
print(t)

在我的机器上打印(AMD 5700x):

Index: 74716
Small: [ 1 -1  2  3  4 -5  6]
Big:   [ 1 -3  2  3  3 -7  7]
0.041199692990630865
© www.soinside.com 2019 - 2024. All rights reserved.