我知道有 scipy.signal.convolve2d 函数来处理 2d numpy 数组的二维卷积,并且有 numpy.ma 模块来处理丢失的数据,但这两种方法似乎彼此不兼容(这意味着甚至如果您在 numpy 中屏蔽二维数组,则 convolve2d 中的过程不会受到影响)。有没有办法仅使用 numpy 和 scipy 包来处理卷积中的缺失值?
例如:
1 - 3 4 5
1 2 - 4 5
Array = 1 2 3 - 5
- 2 3 4 5
1 2 3 4 -
Kernel = 1 0
0 -1
期望的卷积结果(数组,内核,边界='wrap'):
-1 - -1 -1 4
-1 -1 - -1 4
Result = -1 -1 -1 - 5
- -1 -1 4 4
1 -1 -1 -1 -
感谢Aguy的建议,这是一个非常好的方法来帮助计算卷积后的结果。现在假设我们可以从 Array.mask 获取 Array 的掩码,这将给我们一个结果
False True False False False
False False True False False
Array.mask == False False False True False
True False False False False
False False False False True
如何使用这个掩码将卷积后的结果转换为掩码数组?
我不认为用 0 替换是执行此操作的正确方法,您正在将协卷积值推向 0。这些缺失实际上应该被视为“缺失”。因为它们代表了缺失的信息,没有理由假设它们可能是 0,并且它们根本不应该参与任何计算。
我尝试将缺失值设置为
numpy.nan
然后进行卷积,结果表明内核和任何缺失之间的任何重叠都会在结果中给出 nan
,即使重叠与内核中的 0 相同,所以你得到结果中缺失的漏洞扩大了。根据您的应用程序,这可能是所需的结果。
但在某些情况下,您不想仅仅为了 1 个缺失而丢弃这么多信息(也许 <= 50% of missing is still tolerable). In such cases, I've found another module astropy 有更好的实现:
numpy.nan
被忽略(或替换为插值?)。
因此使用
astropy
,您将执行以下操作:
from astropy.convolution import convolve
inarray=numpy.where(inarray.mask,numpy.nan,inarray) # masking still doesn't work, has to set to numpy.nan
result=convolve(inarray,kernel)
但是,您仍然无法控制可以容忍的缺失程度。为了实现这一目标,我创建了一个函数,该函数使用
scipy.ndimage.convolve()
进行初始卷积,但每当涉及缺失 (numpy.nan
) 时手动重新计算值:
def convolve2d(slab,kernel,max_missing=0.5,verbose=True):
'''2D convolution with missings ignored
<slab>: 2d array. Input array to convolve. Can have numpy.nan or masked values.
<kernel>: 2d array, convolution kernel, must have sizes as odd numbers.
<max_missing>: float in (0,1), max percentage of missing in each convolution
window is tolerated before a missing is placed in the result.
Return <result>: 2d array, convolution result. Missings are represented as
numpy.nans if they are in <slab>, or masked if they are masked
in <slab>.
'''
from scipy.ndimage import convolve as sciconvolve
assert numpy.ndim(slab)==2, "<slab> needs to be 2D."
assert numpy.ndim(kernel)==2, "<kernel> needs to be 2D."
assert kernel.shape[0]%2==1 and kernel.shape[1]%2==1, "<kernel> shape needs to be an odd number."
assert max_missing > 0 and max_missing < 1, "<max_missing> needs to be a float in (0,1)."
#--------------Get mask for missings--------------
if not hasattr(slab,'mask') and numpy.any(numpy.isnan(slab))==False:
has_missing=False
slab2=slab.copy()
elif not hasattr(slab,'mask') and numpy.any(numpy.isnan(slab)):
has_missing=True
slabmask=numpy.where(numpy.isnan(slab),1,0)
slab2=slab.copy()
missing_as='nan'
elif (slab.mask.size==1 and slab.mask==False) or numpy.any(slab.mask)==False:
has_missing=False
slab2=slab.copy()
elif not (slab.mask.size==1 and slab.mask==False) and numpy.any(slab.mask):
has_missing=True
slabmask=numpy.where(slab.mask,1,0)
slab2=numpy.where(slabmask==1,numpy.nan,slab)
missing_as='mask'
else:
has_missing=False
slab2=slab.copy()
#--------------------No missing--------------------
if not has_missing:
result=sciconvolve(slab2,kernel,mode='constant',cval=0.)
else:
H,W=slab.shape
hh=int((kernel.shape[0]-1)/2) # half height
hw=int((kernel.shape[1]-1)/2) # half width
min_valid=(1-max_missing)*kernel.shape[0]*kernel.shape[1]
# dont forget to flip the kernel
kernel_flip=kernel[::-1,::-1]
result=sciconvolve(slab2,kernel,mode='constant',cval=0.)
slab2=numpy.where(slabmask==1,0,slab2)
#------------------Get nan holes------------------
miss_idx=zip(*numpy.where(slabmask==1))
if missing_as=='mask':
mask=numpy.zeros([H,W])
for yii,xii in miss_idx:
#-------Recompute at each new nan in result-------
hole_ys=range(max(0,yii-hh),min(H,yii+hh+1))
hole_xs=range(max(0,xii-hw),min(W,xii+hw+1))
for hi in hole_ys:
for hj in hole_xs:
hi1=max(0,hi-hh)
hi2=min(H,hi+hh+1)
hj1=max(0,hj-hw)
hj2=min(W,hj+hw+1)
slab_window=slab2[hi1:hi2,hj1:hj2]
mask_window=slabmask[hi1:hi2,hj1:hj2]
kernel_ij=kernel_flip[max(0,hh-hi):min(hh*2+1,hh+H-hi),
max(0,hw-hj):min(hw*2+1,hw+W-hj)]
kernel_ij=numpy.where(mask_window==1,0,kernel_ij)
#----Fill with missing if not enough valid data----
ksum=numpy.sum(kernel_ij)
if ksum<min_valid:
if missing_as=='nan':
result[hi,hj]=numpy.nan
elif missing_as=='mask':
result[hi,hj]=0.
mask[hi,hj]=True
else:
result[hi,hj]=numpy.sum(slab_window*kernel_ij)
if missing_as=='mask':
result=numpy.ma.array(result)
result.mask=mask
return result
下图演示了输出。左边是一张 30x30 的随机地图,有 3 个
numpy.nan
孔,尺寸为:
右侧是由 5x5 内核(全为 1)进行的卷积输出,容差级别为 50% (
max_missing=0.5
)。
因此,前 2 个较小的孔使用附近的值来填充,而在最后一个孔中,因为缺失的数量 >
0.5x5x5 = 12.5
,所以放置 numpy.nan
来表示缺失的信息。
我发现了一个黑客。使用虚数代替 nan(将 nan 更改为 1i)运行卷积并设置只要虚数高于阈值,它就是 nan。每当它低于时,就取实际值。这是一个代码片段:
frames_complex = np.zeros_like(frames_, dtype=np.complex64)
frames_complex[np.isnan(frames_)] = np.array((1j))
frames_complex[np.bitwise_not(np.isnan(frames_))] =
frames_[np.bitwise_not(np.isnan(frames_))]
convolution = signal.convolve(frames_complex, gaussian_window, 'valid')
convolution[np.imag(convolution)>0.2] = np.nan
convolution = convolution.astype(np.float32)
基于 Ilan Schvartzman 在之前的答案中的想法这里是一个改进的版本。此外,它可以补偿缺失值设置为 0(在实际空间中)的情况,并且支持归一化为 np.sum(in2)。两者均可分别通过参数
correct_missing
和 norm
进行调整。对于 1d 版本,只需将 scipy.signal.convolve2d
替换为 scipy.signal.convolve
。
import scipy.signal
import numpy as np
def masked_convolve2d(in1, in2, correct_missing=True, norm=True, valid_ratio=1./3., *args, **kwargs):
"""A workaround for np.ma.MaskedArray in scipy.signal.convolve.
It converts the masked values to complex values=1j. The complex space allows to set a limit
for the imaginary convolution. The function use a ratio `valid_ratio` of np.sum(in2) to
set a lower limit on the imaginary part to mask the values.
I.e. in1=[[1.,1.,--,--]] in2=[[1.,1.]] -> imaginary_part/sum(in2): [[1., 1., .5, 0.]]
-> valid_ratio=.5 -> out:[[1., 1., .5, --]].
PARAMETERS
---------
in1 : array_like
First input.
in2 : array_like
Second input. Should have the same number of dimensions as `in1`.
correct_missing : bool, optional
correct the value of the convolution as a sum over valid data only,
as masked values account 0 in the real space of the convolution.
norm : bool, optional
if the output should be normalized to np.sum(in2).
valid_ratio: float, optional
the upper limit of the imaginary convolution to mask values. Defined by the ratio of np.sum(in2).
*args, **kwargs: optional
parsed to scipy.signal.convolve(..., *args, **kwargs)
"""
if not isinstance(in1, np.ma.MaskedArray):
in1 = np.ma.array(in1)
# np.complex128 -> stores real as np.float64
con = scipy.signal.convolve2d(in1.astype(np.complex128).filled(fill_value=1j),
in2.astype(np.complex128),
*args, **kwargs
)
# split complex128 to two float64s
con_imag = con.imag
con = con.real
mask = np.abs(con_imag/np.sum(in2)) > valid_ratio
# con_east.real / (1. - con_east.imag): correction, to get the mean over all valid values
# con_east.imag > percent: how many percent of the single convolution value have to be from valid values
if correct_missing:
correction = np.sum(in2) - con_imag
con[correction!=0] *= np.sum(in2)/correction[correction!=0]
if norm:
con /= np.sum(in2)
return np.ma.array(con, mask=mask)
显示
correct_missing
与输入之间差异的示例:
in1 = np.ones((1, 6))
in1[:, 4:] = 0
in1 = np.ma.masked_equal(in1, 0)
in2 = np.ones((1, 4))
in1
>>> array([[1.0, 1.0, 1.0, 1.0, --, --]])
与
correct_missing
的掩蔽卷积为:
masked_convolve2d(in1, in2, correct_missing=True, mode='valid', norm=True)
>>> masked_array(data=[[1.0, 1.0, --]],
mask=[[False, False, True]])
无需校正,如果您想用 np.nan 填充掩码值:
b = masked_convolve2d(in1, in2, correct_missing=False, mode='valid', norm=True)
b.filled(np.nan)
>>> array([[1. , 0.75, nan]]))
我根据输入测试了我的版本(
masked_convolve2d
)与Jason的代码(convolve2d
):
in1 = np.ones((10,6))
in1[:, 4:] = 0
in1 = np.ma.masked_equal(in1, 0)
in2 = np.ones((3,3)) # <kernel> shape needs to be an odd number for Jason's code
在我的机器上得到以下结果:
%timeit -n 10 -r 10 convolve2d(in1, in2)
>>> 2.44 ms ± 117 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
%timeit -n 10 -r 10 masked_convolve2d(in1, in2)
>>> 131 µs ± 19.1 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
我找到了一个可能有帮助的简单解决方案:
希望有帮助!