在内存量相对较低（32GB）的超大数据上进行深度学习的补丁创建方法

Question

我正在尝试训练卫星图像的深度学习语义分割模型。在此过程中，我在带有

patchify

和

rasterio

的小型 AOI 上创建了数据测试运行，没有任何问题。然而，我现在正在尝试扩展它，以包含更多用于训练模型的补丁，并增加了我的 AOI，以实现这一目标。对于上下文，以前我有一个大约 41848x14555x9 (x, y, n_bands) 的 ndarray。现在我希望将其增加到 84632x37000x9 (x, y, n_bands)。

不幸的是，

numpy

甚至无法尝试使用

rasterio

将数组加载到内存中，因为大尺寸对于我可用的内存来说不可行，数组是128GB，我的RAM是32GB。错误信息如下：

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 120. GiB for an array with shape (9, 42170, 84632) and data type float32

由于可用的惰性数组，我尝试使用

rioxarray

/

xarray

和

np.memmap

的组合 - 然而，鉴于我必须迭代带和轴以允许，填充 memmap 的速度非常慢使值适合我拥有的内存量 - 即：

image_io = rioxarray.open_rasterio(/path/to/image_stack.tif)
raster = np.memmap(memmap_name, dtype=np.float32, mode='w+', shape=(image_io.shape[0], image_io.shape[1], image_io.shape[2]))
for i in range(len(image_io.band) + 1):
    for j in range(len(image_io.x)  + 1):
        raster[j, :, i] = image_io[j, :, i].values

我认为我应该问的最重要的事情是，利用我可用的硬件资源这是否可能？

如果是，是否有比我上面列出的方法更好的方法？

我不打算使用

patchify

，但它似乎是用于生成较小图像图块的库。预先感谢您的任何建议！

Answer 1

您可能会发现使用 Xarray + Dask + Xbatcher 是一种高效的替代方案。伪代码如下：

# open tiff as a xarray.DataArray backed by a lazy dask array
# you can tune the chunk size to the size of your patch
chunks= {'x': xc, 'y': yc, 'n_bands': bc} 
da = rioxarray.open_rasterio('/path/to/image_stack.tif', chunks=chunks)

# create the xbatcher.BatchGenerator, this will let you iterate through your
# DataArray in smaller-than-memory batches.
bgen = xbatcher.BatchGenerator(da, {'x': xc, 'y': yc, 'n_bands': bc})
for patch in bgen:
    # handle patch

在内存量相对较低（32GB）的超大数据上进行深度学习的补丁创建方法

问题描述投票：0回答：1

1个回答

最新问题

在内存量相对较低（32GB）的超大数据上进行深度学习的补丁创建方法

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1