使用 dask 映射数组并返回数据帧

问题描述 投票:0回答:1

我正在使用 dask 和 zarr 来操作一些非常大的图像。

我设置了一个管道,可以对这些图像执行一些转换,然后我想使用 skimage 中的regionprops和regionprops_table函数来测量图像的属性。这将矩阵作为输入并返回数据帧。我不能使用map_overlap,因为这会重新连接要返回的矩阵,但我想要与此类似的东西:

import numpy as np
import dask.array as da
import pandas as pd

mask = np.zeros((1000, 1000), dtype=int)
mask[100:200, 100:200] = 1
mask[300:400, 300:400] = 2
mask[500:600, 500:600] = 3
mask = da.from_array(mask, chunks=(200, 200))

def get_data_frame(mask):
    res = regionprops_table(mask, properties=('label', 'area', 'eccentricity'))
    df = pd.DataFrame(res)
    return df

mask.map_overlap(get_data_frame, depth=50, boundary=None).compute()
    

返回 pandas 数据帧或 dask 数据帧,但我希望并行处理每个块。

python dask
1个回答
0
投票

你尝试过这个吗?

import numpy as np
import dask.array as da
import pandas as pd
from skimage.measure import regionprops_table
from dask import delayed

# Function to compute regionprops for a chunk
def compute_chunk(chunk):
    res = regionprops_table(chunk, properties=('label', 'area', 'eccentricity'))
    df = pd.DataFrame(res)
    return df

# Function to operate on blocks using map_blocks
def get_data_frame_blockwise(mask_block):
    # Delayed function to compute regionprops for each block
    delayed_results = [delayed(compute_chunk)(block) for block in mask_block]

    # Compute delayed results
    computed_results = delayed_results.compute()

    # Concatenate the results along the label axis
    result_df = pd.concat(computed_results, ignore_index=True)

    return result_df

# Generate your mask
mask = np.zeros((1000, 1000), dtype=int)
mask[100:200, 100:200] = 1
mask[300:400, 300:400] = 2
mask[500:600, 500:600] = 3
mask = da.from_array(mask, chunks=(200, 200))

# Use map_overlap with map_blocks
result = mask.map_overlap(get_data_frame_blockwise, depth=50, boundary=None, dtype=object)
result_df = result.compute()

# Display the result
print(result_df)
© www.soinside.com 2019 - 2024. All rights reserved.