我正在使用 dask 和 zarr 来操作一些非常大的图像。
我设置了一个管道,可以对这些图像执行一些转换,然后我想使用 skimage 中的regionprops和regionprops_table函数来测量图像的属性。这将矩阵作为输入并返回数据帧。我不能使用map_overlap,因为这会重新连接要返回的矩阵,但我想要与此类似的东西:
import numpy as np
import dask.array as da
import pandas as pd
mask = np.zeros((1000, 1000), dtype=int)
mask[100:200, 100:200] = 1
mask[300:400, 300:400] = 2
mask[500:600, 500:600] = 3
mask = da.from_array(mask, chunks=(200, 200))
def get_data_frame(mask):
res = regionprops_table(mask, properties=('label', 'area', 'eccentricity'))
df = pd.DataFrame(res)
return df
mask.map_overlap(get_data_frame, depth=50, boundary=None).compute()
返回 pandas 数据帧或 dask 数据帧,但我希望并行处理每个块。
你尝试过这个吗?
import numpy as np
import dask.array as da
import pandas as pd
from skimage.measure import regionprops_table
from dask import delayed
# Function to compute regionprops for a chunk
def compute_chunk(chunk):
res = regionprops_table(chunk, properties=('label', 'area', 'eccentricity'))
df = pd.DataFrame(res)
return df
# Function to operate on blocks using map_blocks
def get_data_frame_blockwise(mask_block):
# Delayed function to compute regionprops for each block
delayed_results = [delayed(compute_chunk)(block) for block in mask_block]
# Compute delayed results
computed_results = delayed_results.compute()
# Concatenate the results along the label axis
result_df = pd.concat(computed_results, ignore_index=True)
return result_df
# Generate your mask
mask = np.zeros((1000, 1000), dtype=int)
mask[100:200, 100:200] = 1
mask[300:400, 300:400] = 2
mask[500:600, 500:600] = 3
mask = da.from_array(mask, chunks=(200, 200))
# Use map_overlap with map_blocks
result = mask.map_overlap(get_data_frame_blockwise, depth=50, boundary=None, dtype=object)
result_df = result.compute()
# Display the result
print(result_df)