Python 中缺少数据的二维卷积

Question

我知道有 scipy.signal.convolve2d 函数来处理 2d numpy 数组的二维卷积，并且有 numpy.ma 模块来处理丢失的数据，但这两种方法似乎彼此不兼容（这意味着甚至如果您在 numpy 中屏蔽二维数组，则 convolve2d 中的过程不会受到影响）。有没有办法仅使用 numpy 和 scipy 包来处理卷积中的缺失值？

例如：

            1 - 3 4 5
            1 2 - 4 5
   Array =  1 2 3 - 5
            - 2 3 4 5
            1 2 3 4 -

  Kernel =  1  0
            0 -1

期望的卷积结果（数组，内核，边界='wrap'）：

               -1  - -1 -1 4
               -1 -1  - -1 4
    Result =   -1 -1 -1  - 5
                - -1 -1  4 4
                1 -1 -1 -1 -

感谢Aguy的建议，这是一个非常好的方法来帮助计算卷积后的结果。现在假设我们可以从 Array.mask 获取 Array 的掩码，这将给我们一个结果

                   False True  False False False                       
                   False False True  False False
    Array.mask ==  False False False True  False
                   True  False False False False
                   False False False False True

如何使用这个掩码将卷积后的结果转换为掩码数组？

Answer 1

我不认为用 0 替换是执行此操作的正确方法，您正在将协卷积值推向 0。这些缺失实际上应该被视为“缺失”。因为它们代表了缺失的信息，没有理由假设它们可能是 0，并且它们根本不应该参与任何计算。

我尝试将缺失值设置为

numpy.nan

然后进行卷积，结果表明内核和任何缺失之间的任何重叠都会在结果中给出

nan

，即使重叠与内核中的 0 相同，所以你得到结果中缺失的漏洞扩大了。根据您的应用程序，这可能是所需的结果。

但在某些情况下，您不想仅仅为了 1 个缺失而丢弃这么多信息（也许 <= 50% of missing is still tolerable). In such cases, I've found another module astropy 有更好的实现：

numpy.nan

被忽略（或替换为插值？）。

因此使用

astropy

，您将执行以下操作：

from astropy.convolution import convolve
inarray=numpy.where(inarray.mask,numpy.nan,inarray) # masking still doesn't work, has to set to numpy.nan
result=convolve(inarray,kernel)

但是，您仍然无法控制可以容忍的缺失程度。为了实现这一目标，我创建了一个函数，该函数使用

scipy.ndimage.convolve()

进行初始卷积，但每当涉及缺失 (

numpy.nan

) 时手动重新计算值：

def convolve2d(slab,kernel,max_missing=0.5,verbose=True):
    '''2D convolution with missings ignored

    <slab>: 2d array. Input array to convolve. Can have numpy.nan or masked values.
    <kernel>: 2d array, convolution kernel, must have sizes as odd numbers.
    <max_missing>: float in (0,1), max percentage of missing in each convolution
                   window is tolerated before a missing is placed in the result.

    Return <result>: 2d array, convolution result. Missings are represented as
                     numpy.nans if they are in <slab>, or masked if they are masked
                     in <slab>.

    '''

    from scipy.ndimage import convolve as sciconvolve

    assert numpy.ndim(slab)==2, "<slab> needs to be 2D."
    assert numpy.ndim(kernel)==2, "<kernel> needs to be 2D."
    assert kernel.shape[0]%2==1 and kernel.shape[1]%2==1, "<kernel> shape needs to be an odd number."
    assert max_missing > 0 and max_missing < 1, "<max_missing> needs to be a float in (0,1)."

    #--------------Get mask for missings--------------
    if not hasattr(slab,'mask') and numpy.any(numpy.isnan(slab))==False:
        has_missing=False
        slab2=slab.copy()

    elif not hasattr(slab,'mask') and numpy.any(numpy.isnan(slab)):
        has_missing=True
        slabmask=numpy.where(numpy.isnan(slab),1,0)
        slab2=slab.copy()
        missing_as='nan'

    elif (slab.mask.size==1 and slab.mask==False) or numpy.any(slab.mask)==False:
        has_missing=False
        slab2=slab.copy()

    elif not (slab.mask.size==1 and slab.mask==False) and numpy.any(slab.mask):
        has_missing=True
        slabmask=numpy.where(slab.mask,1,0)
        slab2=numpy.where(slabmask==1,numpy.nan,slab)
        missing_as='mask'

    else:
        has_missing=False
        slab2=slab.copy()

    #--------------------No missing--------------------
    if not has_missing:
        result=sciconvolve(slab2,kernel,mode='constant',cval=0.)
    else:
        H,W=slab.shape
        hh=int((kernel.shape[0]-1)/2)  # half height
        hw=int((kernel.shape[1]-1)/2)  # half width
        min_valid=(1-max_missing)*kernel.shape[0]*kernel.shape[1]

        # dont forget to flip the kernel
        kernel_flip=kernel[::-1,::-1]

        result=sciconvolve(slab2,kernel,mode='constant',cval=0.)
        slab2=numpy.where(slabmask==1,0,slab2)

        #------------------Get nan holes------------------
        miss_idx=zip(*numpy.where(slabmask==1))

        if missing_as=='mask':
            mask=numpy.zeros([H,W])

        for yii,xii in miss_idx:

            #-------Recompute at each new nan in result-------
            hole_ys=range(max(0,yii-hh),min(H,yii+hh+1))
            hole_xs=range(max(0,xii-hw),min(W,xii+hw+1))

            for hi in hole_ys:
                for hj in hole_xs:
                    hi1=max(0,hi-hh)
                    hi2=min(H,hi+hh+1)
                    hj1=max(0,hj-hw)
                    hj2=min(W,hj+hw+1)

                    slab_window=slab2[hi1:hi2,hj1:hj2]
                    mask_window=slabmask[hi1:hi2,hj1:hj2]
                    kernel_ij=kernel_flip[max(0,hh-hi):min(hh*2+1,hh+H-hi), 
                                     max(0,hw-hj):min(hw*2+1,hw+W-hj)]
                    kernel_ij=numpy.where(mask_window==1,0,kernel_ij)

                    #----Fill with missing if not enough valid data----
                    ksum=numpy.sum(kernel_ij)
                    if ksum<min_valid:
                        if missing_as=='nan':
                            result[hi,hj]=numpy.nan
                        elif missing_as=='mask':
                            result[hi,hj]=0.
                            mask[hi,hj]=True
                    else:
                        result[hi,hj]=numpy.sum(slab_window*kernel_ij)

        if missing_as=='mask':
            result=numpy.ma.array(result)
            result.mask=mask

    return result

下图演示了输出。左边是一张 30x30 的随机地图，有 3 个

numpy.nan

孔，尺寸为：

1x1
3x3
5x5

右侧是由 5x5 内核（全为 1）进行的卷积输出，容差级别为 50% (

max_missing=0.5

)。

因此，前 2 个较小的孔使用附近的值来填充，而在最后一个孔中，因为缺失的数量 >

0.5x5x5 = 12.5

，所以放置

numpy.nan

来表示缺失的信息。

Answer 2

我发现了一个黑客。使用虚数代替 nan（将 nan 更改为 1i）运行卷积并设置只要虚数高于阈值，它就是 nan。每当它低于时，就取实际值。这是一个代码片段：

frames_complex = np.zeros_like(frames_, dtype=np.complex64)
frames_complex[np.isnan(frames_)] = np.array((1j))
frames_complex[np.bitwise_not(np.isnan(frames_))] =                         
frames_[np.bitwise_not(np.isnan(frames_))]
convolution = signal.convolve(frames_complex, gaussian_window, 'valid')
convolution[np.imag(convolution)>0.2] = np.nan
convolution = convolution.astype(np.float32)

Answer 3

基于 Ilan Schvartzman 在之前的答案中的想法这里是一个改进的版本。此外，它可以补偿缺失值设置为 0（在实际空间中）的情况，并且支持归一化为 np.sum(in2)。两者均可分别通过参数

correct_missing

和

norm

进行调整。对于 1d 版本，只需将

scipy.signal.convolve2d

替换为

scipy.signal.convolve

。

import scipy.signal
import numpy as np

def masked_convolve2d(in1, in2, correct_missing=True, norm=True, valid_ratio=1./3., *args, **kwargs):
    """A workaround for np.ma.MaskedArray in scipy.signal.convolve. 
    It converts the masked values to complex values=1j. The complex space allows to set a limit
    for the imaginary convolution. The function use a ratio `valid_ratio` of np.sum(in2) to 
    set a lower limit on the imaginary part to mask the values.
    I.e. in1=[[1.,1.,--,--]] in2=[[1.,1.]] -> imaginary_part/sum(in2): [[1., 1., .5, 0.]]
    -> valid_ratio=.5 -> out:[[1., 1., .5, --]].
    PARAMETERS
    ---------
    in1 : array_like
        First input.
    in2 : array_like
        Second input. Should have the same number of dimensions as `in1`.
    correct_missing : bool, optional
        correct the value of the convolution as a sum over valid data only, 
        as masked values account 0 in the real space of the convolution.
    norm : bool, optional
        if the output should be normalized to np.sum(in2).
    valid_ratio: float, optional
        the upper limit of the imaginary convolution to mask values. Defined by the ratio of np.sum(in2).
    *args, **kwargs: optional
        parsed to scipy.signal.convolve(..., *args, **kwargs)
    """
    if not isinstance(in1, np.ma.MaskedArray):
        in1 = np.ma.array(in1)
    
    # np.complex128 -> stores real as np.float64
    con = scipy.signal.convolve2d(in1.astype(np.complex128).filled(fill_value=1j), 
                                  in2.astype(np.complex128), 
                                  *args, **kwargs
                                 )
    
    # split complex128 to two float64s
    con_imag = con.imag
    con = con.real
    mask = np.abs(con_imag/np.sum(in2)) > valid_ratio
    
    # con_east.real / (1. - con_east.imag): correction, to get the mean over all valid values
    # con_east.imag > percent: how many percent of the single convolution value have to be from valid values
    if correct_missing:
        correction = np.sum(in2) - con_imag
        con[correction!=0] *= np.sum(in2)/correction[correction!=0]
        
    if norm:
        con /= np.sum(in2)
        
    return np.ma.array(con, mask=mask)

示例

显示

correct_missing

与输入之间差异的示例：

in1 = np.ones((1, 6))
in1[:, 4:] = 0
in1 = np.ma.masked_equal(in1, 0)

in2 = np.ones((1, 4))

in1
>>> array([[1.0, 1.0, 1.0, 1.0, --, --]])

与

correct_missing

的掩蔽卷积为：

masked_convolve2d(in1, in2, correct_missing=True, mode='valid', norm=True)
>>> masked_array(data=[[1.0, 1.0, --]],
                 mask=[[False, False,  True]])

无需校正，如果您想用 np.nan 填充掩码值：

b = masked_convolve2d(in1, in2, correct_missing=False, mode='valid', norm=True)
b.filled(np.nan)
>>> array([[1.  , 0.75,  nan]]))

性能测试

我根据输入测试了我的版本（

masked_convolve2d

）与Jason的代码（

convolve2d

）：

in1 = np.ones((10,6))
in1[:, 4:] = 0
in1 = np.ma.masked_equal(in1, 0)

in2 = np.ones((3,3))  # <kernel> shape needs to be an odd number for Jason's code

在我的机器上得到以下结果：

%timeit -n 10 -r 10 convolve2d(in1, in2)
>>> 2.44 ms ± 117 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

%timeit -n 10 -r 10 masked_convolve2d(in1, in2)
>>> 131 µs ± 19.1 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

Answer 4

我找到了一个可能有帮助的简单解决方案：

使缺失数据等于 0。我们将这个矩阵称为 M0。
复制数据，使非零值等于 1。我们将此矩阵称为 M1。
对原始数据 (M0) 应用卷积。结果将是“错误的”，因为您已经包含了与缺失数据相对应的零。我们称之为 R0。
将卷积应用于零和一的副本 (M1)。其结果将是一个“校正”矩阵：接近零的值意味着您在该点使用了大量缺失数据，而接近 1 的值意味着您几乎没有使用缺失数据。我们称之为 R1。
将第一个结果 (R0) 除以最后一个“校正”矩阵 (R1) 以标准化结果

希望有帮助！

Python 中缺少数据的二维卷积

问题描述投票：0回答：4

4个回答

示例

性能测试

最新问题

Python 中缺少数据的二维卷积

问题描述 投票：0回答：4

4个回答

示例

性能测试

最新问题

问题描述投票：0回答：4