我有一个大型图像分类数据集,以
.hdf5
格式存储。数据集的标签和图像存储在 .hdf5
文件中。我无法查看图像,因为它们以数组的形式存储。我使用的数据集读取代码如下,
import h5py
import numpy
f = h5py.File('data/images.hdf5', 'r')
print(list(f.keys()))
['datasets']
group = f['datasets']
list(group.keys())
['car']
现在当我阅读该组时
cars
我有以下输出,
data = group['car']
data.shape,data[0].shape,data[1].shape
((51,), (383275,), (257120,)
因此,标签
51
看起来有 car
图像,并且图像存储为 383275
和 257120
维数组,没有有关其高度和宽度尺寸的信息。我想再次将图像保存为 RGB。
接下来,按照代码here,我尝试读取图像。
import numpy as np
from PIL import Image
# hdf = h5py.File("Sample.h5",'r')
array = data[0]
img = Image.fromarray(array.astype('uint8'), 'RGB')
img.save("yourimage.thumbnail", "JPEG")
img.show()
不幸的是,收到以下错误。
File /usr/local/lib/python3.8/dist-packages/PIL/Image.py:784, in Image.frombytes(self, data, decoder_name, *args)
781 s = d.decode(data)
783 if s[0] >= 0:
--> 784 raise ValueError("not enough image data")
785 if s[1] != 0:
786 raise ValueError("cannot decode image data")
ValueError: not enough image data
参考资料我已经检查了hdf组帮助库等。 任何帮助都会非常有用。谢谢。
import h5py
import numpy as np
from PIL import Image
# Open the HDF5 file
with h5py.File('data/images.hdf5', 'r') as f:
# Access the dataset containing images
data = f['datasets']['car']
# Iterate through each image stored as a 1-D array
for i in range(len(data)):
# Assuming you know the correct dimensions. For example, 256x256 with 3 color channels
height, width, channels = 256, 256, 3
# Reshape the 1-D array into a 3-D array with the shape (height, width, channels)
image_array = np.reshape(data[i], (height, width, channels))
# Convert the numpy array into a PIL image
img = Image.fromarray(image_array.astype('uint8'), 'RGB')
# Save or display the image
img.save(f'image_{i}.png')
img.show() # Remove this line if you don't want to display the image