如何使用整数 dtype 对 xarray / zarr 中的 NaN 值进行编码？

Question

我有一个包含 NaN 的大型 xarray DataArray，想用 zarr 保存它。我想最小化文件大小，并且可以接受损失一些精度 - 16 位应该没问题。我尝试使用

numcodecs.FixedScaleOffset(astype='u2')

过滤器，但这将所有 NaN 存储为零。由于数据还包含零作为有效值，因此这不是很有帮助。

Answer 1

NumPy 的

u2

（又名

uint16

）不支持 NaN 值（请这个 SO 答案）。 Zarr 只是反映了 NumPy 的行为。

Answer 2

它不能与

numcodecs.Quantize

配合使用，但 xarray

encoding

参数可以指定

_FillValue

:

dataset.to_zarr(store, encoding={'<array-name>': {'dtype': 'uint16', '_FillValue': 65535}})

参见 https://xarray.pydata.org/en/stable/io.html#writing-encoded-data

Answer 3

通常，您可以使用 max int 作为 NaN 占位符。所以，如果你子类化

FixedScaleOffset

:

import numpy as np
from numcodecs import FixedScaleOffset
from numcodecs.compat import ensure_ndarray, ndarray_copy


class FixedScaleOffsetNaN(FixedScaleOffset):
    """
    NaN version of FixedScaleOffset
    """

    codec_id = "fixedscaleoffsetnan"

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.missing_int = np.iinfo(self.astype).max

    def encode(self, buf):

        # normalise input
        arr = ensure_ndarray(buf).view(self.dtype)

        # flatten to simplify implementation
        arr = arr.reshape(-1, order="A")

        # compute scale offset
        enc = (arr - self.offset) * self.scale

        # round to nearest integer
        enc = np.around(enc)

        # convert dtype
        enc = np.where(
            np.isnan(arr), self.missing_int, enc.astype(self.astype, copy=False)
        )

        return enc

    def decode(self, buf, out=None):

        # interpret buffer as numpy array
        enc = ensure_ndarray(buf).view(self.astype)

        # flatten to simplify implementation
        enc = enc.reshape(-1, order="A")

        # decode scale offset
        dec = (enc / self.scale) + self.offset

        # convert dtype
        dec = np.where(
            enc == self.missing_int, np.nan, dec.astype(self.dtype, copy=False)
        )

        # handle output
        return ndarray_copy(dec, out)

关键位是

np.where

行。

> compressor = FixedScaleOffsetNaN(0, 1e4, dtype="f8", astype="u2")
> x = np.array([0.0, 0.1, 0.11, 0.111, 0.1111, 0.11111, 1.0, 1.5, np.nan])
> y = compressor.encode(x)
[    0  1000  1100  1110  1111  1111 10000 15000 65535]
> z = compressor.decode(y)
[0.     0.1    0.11   0.111  0.1111 0.1111 1.     1.5       nan]

如何使用整数 dtype 对 xarray / zarr 中的 NaN 值进行编码？

问题描述投票：0回答：3

3个回答

最新问题

如何使用整数 dtype 对 xarray / zarr 中的 NaN 值进行编码？

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3