在处理两个不同大小的输入数组时,如何有效利用 Dask `map_overlap` 函数?

问题描述 投票:0回答:1

我正在使用两个变量,可以将其视为矩阵 - 维度为 (100, 100) 的“a”和维度为 (200, 200) 的“b”。我的目标是执行涉及“a”和“b”的计算,但首先,我需要将“a”放大 2 倍以匹配“b”的大小。随后,我打算应用我的自定义函数,关键是利用 Dask 的

map_overlap
函数来确保它在指定的重叠情况下高效运行。最终,我的目标是获得与“b”大小匹配的结果。 下面,我提供了演示此场景的 Python 代码。

import dask.array as da
import numpy as np
from scipy.ndimage import zoom

data = da.ones((40, 140,140), chunks=(10, 40,40))
data2 = da.ones((80, 280, 280), chunks=(10, 40,40))
data_upsampled = da.map_blocks(lambda x: zoom(x,2), data, dtype = np.uint16, chunks=(20,80,80))

res = da.map_overlap(lambda x,y: x+y, data_upsampled, data2, depth = (5,10,10), boundary="constant")

但是我收到此错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 10
      7 data2 = da.ones((80, 280, 280), chunks=(10, 40,40))
      8 data_upsampled = da.map_blocks(lambda x: zoom(x,2), data, dtype = np.uint16, chunks=(20,80,80))
---> 10 res = da.map_overlap(lambda x,y: x+y, data_upsampled, data2, depth = (5,10,10), boundary="constant")
     11 #data_upsampled.to_hdf5('myfile.hdf5', '/up_sampled')

File \AppData\Local\anaconda3\envs\napari\lib\site-packages\dask\array\overlap.py:693, in map_overlap(func, depth, boundary, trim, align_arrays, allow_rechunk, *args, **kwargs)
    690 if align_arrays:
    691     # Reverse unification order to allow block broadcasting
    692     inds = [list(reversed(range(x.ndim))) for x in args]
--> 693     _, args = unify_chunks(*list(concat(zip(args, inds))), warn=False)
    695 # Escape to map_blocks if depth is zero (a more efficient computation)
    696 if all([all(depth_val == 0 for depth_val in d.values()) for d in depth]):

File \AppData\Local\anaconda3\envs\napari\lib\site-packages\dask\array\core.py:3971, in unify_chunks(*args, **kwargs)
   3968     else:
   3969         nameinds.append((a, ind))
-> 3971 chunkss = broadcast_dimensions(nameinds, blockdim_dict, consolidate=common_blockdim)
   3972 nparts = math.prod(map(len, chunkss.values()))
   3974 if warn and nparts and nparts >= max_parts * 10:

File \AppData\Local\anaconda3\envs\napari\lib\site-packages\dask\blockwise.py:1467, in broadcast_dimensions(argpairs, numblocks, sentinels, consolidate)
   1464 g2 = {k: v - set(sentinels) if len(v) > 1 else v for k, v in g.items()}
...
   3883 # burned through all of the chunk tuples.
   3884 # For efficiency's sake we reverse the lists so that we can pop off the end
   3885 rchunks = [list(ntd)[::-1] for ntd in non_trivial_dims]

ValueError: ('Chunks do not add up to same value', {(40, 40, 40, 40, 40, 40, 40), (80, 80, 80, 80)})```

我还没有找到解决此问题的方法,非常感谢您对此事的帮助。

dask
1个回答
0
投票

data_upsampled
data2
中没有相同的形状,您需要确保至少两个数组之间的形状相同。这是因为
data
数组的块没有正确划分数组。

一般来说,两个数组之间具有相同的分块也更好。

工作代码例如:

import dask.array as da
import numpy as np
from scipy.ndimage import zoom
data = da.ones((40, 140,140), chunks=(5, 20,20))
data2 = da.ones((80, 280, 280), chunks=(10, 40,40))
data_upsampled = da.map_blocks(lambda x: zoom(x,2), data, dtype = np.uint16, chunks=(10, 40,40))
res = da.map_overlap(lambda x,y: x+y, data_upsampled, data2, depth = (5,10,10), boundary="constant")
© www.soinside.com 2019 - 2024. All rights reserved.