将带有图像列的Dask DataFrame保存到HDF5

问题描述 投票:0回答:1

我正在尝试将各种大小的图像加载到Dask DataFrame列中,并将数据帧保存为HDF5文件格式。

这是标准方法:

import glob

import dask.dataframe as dd
import pandas as pd
import numpy as np
from skimage.io import imread


dir = '/Users/petioptrv/Downloads/mask'
filenames = glob.glob(dir + '/*.png')[:5]

df = pd.DataFrame({"paths": filenames})
ddf = dd.from_pandas(df, npartitions=2)
ddf['images'] = ddf['paths'].apply(imread, meta=('images', np.uint8))
ddf.to_hdf('test.h5', '/data')

我收到以下错误消息:

...
  File "/Users/petioptrv/miniconda3/envs/dask/lib/python3.7/site-packages/pandas/io/pytables.py", line 2214, in set_atom_string
    item=item, type=inferred_type
TypeError: Cannot serialize the column [images] because
its data contents are [mixed] object dtype

基本上,PyTables检测到该列具有object dtype并检查其类型是否为str。不是,所以抛出异常。

我可能可以通过将图像打开为字节数组并将其转换为字符串来破解它,但这远非理想情况。

image dataframe dask hdf5
1个回答
0
投票

尝试按照data_columns问题中的建议指定this

ddf.to_hdf('test.h5', '/data', format = 'table', data_columns = ['images'])
© www.soinside.com 2019 - 2024. All rights reserved.