压缩netcdf文件中的数组。

Question

我希望能够通过对缩放因子的操作来压缩存储在netcdf文件中的数组，并通过转换数组数据类型（如float32到int16）来添加偏移量应用到数组中。

我想把通常太大的栅格数据变成更小的、更容易管理的栅格数据，这在python中是不可能实现的，我知道可以对netcdf数据应用比例因子和偏移量，不仅可以使文件更小，而且在加载数组时也可以应用同样的逻辑，以便于内存管理。我知道可以对netcdf数据应用缩放因子和偏移量，这样不仅可以使文件尺寸变小，而且在加载数组时，同样的逻辑也可以适用，以便于内存管理。我已经有另一种方法可以用不同的 numpy 来管理大型数组，但我想用 netcdfs 来实现。

我已经有了下面的代码，它是基于几个链接的。http:/james.hiebert.nameblogwork20150418NetCDF-scale-factors.html。

我使用的测试文件是我自己生成的一个netcdf文件，其中存放了一个float32 numpy数组，通过gdal translate的方式将geotiff文件转换为netcdf文件

import netCDF4
from math import floor
import numpy as np


def compute_scale_and_offset(min, max, n):
    # stretch/compress data to the available packed range
    scale_factor = (max - min) / (2 ** n - 1)
    # translate the range to be symmetric about zero
    add_offset = min + 2 ** (n - 1) * scale_factor
    return scale_factor, add_offset

def pack_value(unpacked_value, scale_factor, add_offset):
    return unpacked_value - add_offset / scale_factor

def unpack_value(packed_value, scale_factor, add_offset):
    return packed_value * scale_factor + add_offset

netcdf_path = r"path/to/netcdf"

nc = netCDF4.Dataset(netcdf_path,"a")
data = nc.variables['Band1'][:]

scale_factor,offset = compute_scale_and_offset(np.min(data),np.max(data),16)
data = pack_value(data,scale_factor,offset)
data_b = data.astype(np.int16,copy=False)
nc.variables['Band1'][:] = data_b

nc.close()

目前，当我运行上述代码时，我正在处理的文件大小没有变化，但核心数据数组在输出值方面确实发生了变化。我期望的结果是改变上面的代码，它可以在任何通用的netcdf文件中转换数据数组，并允许应用偏移量并将其存储在文件中，这样它们就可以在从netcdf4读取时被加载进来。

Answer 1

Xarray可以使用 to_netcdf

import xarray as xr


def compute_scale_and_offset(da, n=16):
    """Calculate offset and scale factor for int conversion

    Based on Krios101's code above.
    """

    vmin = np.min(da).item()
    vmax = np.max(da).item()

    # stretch/compress data to the available packed range
    scale_factor = (vmax - vmin) / (2 ** n - 1)

    # translate the range to be symmetric about zero
    add_offset = vmin + 2 ** (n - 1) * scale_factor

    return scale_factor, add_offset


ds = xr.open_dataset("infile.nc")

scale_factor, add_offset = compute_scale_and_offset(da['my_var'])

ds.to_netcdf(outfile, encoding={"my_var": {
    "dtype": 'int16',
    "scale_factor": scale_factor,
    "add_offset": add_offset,
    "_FillValue": -32767,
}})

压缩netcdf文件中的数组。

问题描述投票：1回答：1

1个回答

最新问题

压缩netcdf文件中的数组。

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1