如何使用 Rust 使用 hdf5-rust 板条箱读取数据集的 HDF5 字符串属性?

问题描述 投票:0回答:1

我有一个 HDF5 文件,其结构如下,可使用

h5dump
查看:

❯ h5dump -n GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5
HDF5 "GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5" {
FILE_CONTENTS {
 group      /
 group      /All_Data
 group      /All_Data/VIIRS-MOD-GEO-TC_All
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Height
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Latitude
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Longitude
 ...
 group      /Data_Products
 group      /Data_Products/VIIRS-MOD-GEO-TC
 dataset    /Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Aggr
 dataset    /Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0
 }
}

我有兴趣使用 Rust(通过

hdf5-rust
板条箱)读取数据集
/Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0
的字符串属性,该属性具有签名

ATTRIBUTE "N_Granule_ID" {
   DATATYPE  H5T_STRING {
      STRSIZE 16;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_ASCII;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SIMPLE { ( 1, 1 ) / ( 1, 1 ) }
   DATA {
   (0,0): "NPP002194429582"
   }
}

我尝试了以下...

use anyhow::{Ok, Result};
use hdf5::File;
use ndarray::{Array, Array2};

fn main() -> Result<()> {

    filename = "GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5".to_string();
    let file = File::open(filename)?;
    let dataset = file.dataset("Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0")?;
    let attribute = dataset.attr("N_Granule_ID")?;

    // Don't know what to use here...
    let v: Array2<String> = attribute.read_2d::<String>()?;

    Ok(())
}

这似乎可以正常工作,直到我需要将属性对象(

attribute.read_2d()
等)的内容读入 Rust 数据类型。从属性元数据中的
DATASPACE  SIMPLE { ( 1, 1 ) / ( 1, 1 ) }
条目,我认为该属性应该被读入具有单个条目的 2D 数组(即:(1x1)),但我不太确定要使用哪种读取方法和数据类型。

hdf5-rust
包提供的唯一示例使用

读取基于复合枚举的属性
attribute = attr.read_1d::<Color>()?

其中

Color
是用户定义的枚举数据类型,通过派生
H5Type

注册为 HDF5 数据集
#[derive(H5Type, Clone, PartialEq, Debug)] // register with HDF5
#[repr(u8)]
pub enum Color {
    R = 1,
    G = 2,
    B = 3,
}

对于非复合数据类型(

f32
i32
String
)如何执行此操作?

rust hdf5 attr
1个回答
0
投票

我从其中一位hdf5-rust贡献者那里得到了一个

提示
,我应该使用
FixedAscii<size>
。对于附加到根组的属性

let root_attr = file.attr("Mission_Name")?;

我做到了

let v_reader = root_attr.as_reader();
let v = v_reader.read::<FixedAscii<4>, ndarray::Dim<[usize; 2]>>()?;
println!("\tv = {:?}", v);

或者另一种选择

let v = root_attr.read_2d::<FixedAscii<4>>()?;
println!("\tv = {:?}", v);

他们都给出了结果

v = [["NPP"]], shape=[1, 1], strides=[1, 1], layout=CFcf (0xf), const ndim=2

我通过

到达了属性有效负载
if let Some(x) = v.first() {
    print!("\tx = {:?}", x.to_string());
}

这就是我所追求的。对于我使用的原始问题中引用的数据集属性

let v = attribute.read_2d::<FixedAscii<16>>()?;
println!("\tv = {:?}", v);

给予

v = [["NPP002194429582"]], shape=[1, 1], strides=[1, 1], layout=CFcf (0xf), const ndim=2

幸运的是,我感兴趣的属性有固定的大小,我提前知道。

我还能够读取带有签名的“向量”字符串属性(类似于文件名列表)

ATTRIBUTE "N_Anc_Filename" {
   DATATYPE  H5T_STRING {
      STRSIZE 104;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_ASCII;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SIMPLE { ( 15, 1 ) / ( 15, 1 ) }
   DATA {
   (0,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0744_1.O.0.0",
   (1,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0745_1.O.0.0",
   (2,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0746_1.O.0.0",
   (3,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0776_1.O.0.0",
   (4,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0777_1.O.0.0",
   (5,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0778_1.O.0.0",
   (6,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0779_1.O.0.0",
   (7,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0780_1.O.0.0",
   (8,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0781_1.O.0.0",
   (9,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0810_1.O.0.0",
   (10,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0811_1.O.0.0",
   (11,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0812_1.O.0.0",
   (12,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0813_1.O.0.0",
   (13,0): "off_Planet-Eph-ANC_Static_JPL_000f_20151008_200001010000Z_20000101000000Z_ee00000000000000Z_np",
   (14,0): "off_USNO-PolarWander-UT1-ANC_Ser7_USNO_000f_20181005_201810050000Z_20181005000106Z_ee20181012120000Z_np"
   }
}

其中

STRSIZE=104
是最长字符串的长度(字符数加上终止符?)。文件名的大小不同,但只要
FixedAscii<>
的参数等于或大于最长的文件名,它就可以工作...

println!("\n\nReading dataset (15, 1) attribute...\n");

let dset_attr = dataset.attr("N_Anc_Filename")?;

let v = dset_attr.read_2d::<FixedAscii<104>>()?;

println!("\tv.shape() = {:?}", v.shape());
println!("\tv.strides() = {:?}", v.strides());
println!("\tv.ndim() = {:?}", v.ndim());

let arr = v.iter().collect::<Vec<_>>();

for (idx, val) in arr.iter().enumerate() {
    println!("\tarr[{:?}] = {:?} ({:?})", idx, val.to_string(), val.len());
}

给予

Reading dataset (15, 1) attribute...

v.shape() = [15, 1]
v.strides() = [1, 1]
v.ndim() = 2

arr[0] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0744_1.O.0.0" (74)
arr[1] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0745_1.O.0.0" (74)
arr[2] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0746_1.O.0.0" (74)
arr[3] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0776_1.O.0.0" (74)
arr[4] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0777_1.O.0.0" (74)
arr[5] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0778_1.O.0.0" (74)
arr[6] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0779_1.O.0.0" (74)
arr[7] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0780_1.O.0.0" (74)
arr[8] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0781_1.O.0.0" (74)
arr[9] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0810_1.O.0.0" (74)
arr[10] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0811_1.O.0.0" (74)
arr[11] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0812_1.O.0.0" (74)
arr[12] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0813_1.O.0.0" (74)
arr[13] = "off_Planet-Eph-ANC_Static_JPL_000f_20151008_200001010000Z_20000101000000Z_ee00000000000000Z_np" (94)
arr[14] = "off_USNO-PolarWander-UT1-ANC_Ser7_USNO_000f_20181005_201810050000Z_20181005000106Z_ee20181012120000Z_np" (103)

这基本上涵盖了我正在阅读的文件的最复杂的用例。

© www.soinside.com 2019 - 2024. All rights reserved.