我正在使用两个变量,可以将其视为矩阵 - 维度为 (100, 100) 的“a”和维度为 (200, 200) 的“b”。我的目标是执行涉及“a”和“b”的计算,但首先,我需要将“a”放大 2 倍以匹配“b”的大小。随后,我打算应用我的自定义函数,关键是利用 Dask 的
map_overlap
函数来确保它在指定的重叠情况下高效运行。最终,我的目标是获得与“b”大小匹配的结果。
下面,我提供了演示此场景的 Python 代码。
import dask.array as da
import numpy as np
from scipy.ndimage import zoom
data = da.ones((40, 140,140), chunks=(10, 40,40))
data2 = da.ones((80, 280, 280), chunks=(10, 40,40))
data_upsampled = da.map_blocks(lambda x: zoom(x,2), data, dtype = np.uint16, chunks=(20,80,80))
res = da.map_overlap(lambda x,y: x+y, data_upsampled, data2, depth = (5,10,10), boundary="constant")
但是我收到此错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[7], line 10
7 data2 = da.ones((80, 280, 280), chunks=(10, 40,40))
8 data_upsampled = da.map_blocks(lambda x: zoom(x,2), data, dtype = np.uint16, chunks=(20,80,80))
---> 10 res = da.map_overlap(lambda x,y: x+y, data_upsampled, data2, depth = (5,10,10), boundary="constant")
11 #data_upsampled.to_hdf5('myfile.hdf5', '/up_sampled')
File \AppData\Local\anaconda3\envs\napari\lib\site-packages\dask\array\overlap.py:693, in map_overlap(func, depth, boundary, trim, align_arrays, allow_rechunk, *args, **kwargs)
690 if align_arrays:
691 # Reverse unification order to allow block broadcasting
692 inds = [list(reversed(range(x.ndim))) for x in args]
--> 693 _, args = unify_chunks(*list(concat(zip(args, inds))), warn=False)
695 # Escape to map_blocks if depth is zero (a more efficient computation)
696 if all([all(depth_val == 0 for depth_val in d.values()) for d in depth]):
File \AppData\Local\anaconda3\envs\napari\lib\site-packages\dask\array\core.py:3971, in unify_chunks(*args, **kwargs)
3968 else:
3969 nameinds.append((a, ind))
-> 3971 chunkss = broadcast_dimensions(nameinds, blockdim_dict, consolidate=common_blockdim)
3972 nparts = math.prod(map(len, chunkss.values()))
3974 if warn and nparts and nparts >= max_parts * 10:
File \AppData\Local\anaconda3\envs\napari\lib\site-packages\dask\blockwise.py:1467, in broadcast_dimensions(argpairs, numblocks, sentinels, consolidate)
1464 g2 = {k: v - set(sentinels) if len(v) > 1 else v for k, v in g.items()}
...
3883 # burned through all of the chunk tuples.
3884 # For efficiency's sake we reverse the lists so that we can pop off the end
3885 rchunks = [list(ntd)[::-1] for ntd in non_trivial_dims]
ValueError: ('Chunks do not add up to same value', {(40, 40, 40, 40, 40, 40, 40), (80, 80, 80, 80)})```
我还没有找到解决此问题的方法,非常感谢您对此事的帮助。
data_upsampled
和data2
中没有相同的形状,您需要确保至少两个数组之间的形状相同。这是因为 data
数组的块没有正确划分数组。
一般来说,两个数组之间具有相同的分块也更好。
工作代码例如:
import dask.array as da
import numpy as np
from scipy.ndimage import zoom
data = da.ones((40, 140,140), chunks=(5, 20,20))
data2 = da.ones((80, 280, 280), chunks=(10, 40,40))
data_upsampled = da.map_blocks(lambda x: zoom(x,2), data, dtype = np.uint16, chunks=(10, 40,40))
res = da.map_overlap(lambda x,y: x+y, data_upsampled, data2, depth = (5,10,10), boundary="constant")