如何使用 GDAL 检索 netcdf 中的所有变量名

Question

我正在努力寻找一种使用 GDAL 从文件中检索元数据信息的方法。具体来说，我想检索乐队名称和它们在给定文件中的存储顺序（可能是 GEOTIFF 或 NETCDF）。

例如，如果我们按照 GDAL 文档中的描述进行操作，我们会从 gdal.Dataset 中获得“GetMetaData”方法（参见here 和here）。尽管此方法返回有关数据集的一整套信息，但它不提供波段名称和它们在给定文件中的存储顺序。事实上，这似乎是一个老问题（从 2015 年开始）似乎还没有解决（更多信息here）。看起来，“R”语言已经解决了这个问题（参见这里），尽管 Python 还没有。

在这里说得更透彻一点，我知道还有其他 Python 包可以帮助完成这项工作（例如 xarray、rasterio 等）；尽管如此，在单个脚本中应该使用的软件包集要简明扼要，这一点很重要。因此，我想知道使用 gdal 查找波段（又名变量）名称以及它们存储在单个文件中的顺序的明确方法。

请让我知道您在这方面的想法。

下面，我提出了一个解决这个问题的起点，其中一个文件由 GDAL 打开（创建一个数据集对象）。

from gdal import Dataset
from osgeo import gdal

OpeneddatasetFile = gdal.Open(f'NETCDF:{input}/{file_name}.nc:' + var)

if isinstance(OpeneddatasetFile , Dataset):
    print("File opened successfully")


# here is where one should be capable of fetching the variable (a.k.a., band) names
# of the OpeneddatasetFile.
# Ideally, it would be most welcome some kind of method that could return a dictionary 
# with this information

# something like:

# VariablesWithinFile = OpeneddatasetFile.getVariablesWithinFileAsDictionary()

Answer 1

我终于找到了一种使用 GDAL 从 NETCDF 文件中检索变量名的方法，这要感谢上面 Robert Davy 给出的评论。

我已将代码组织成一组函数以帮助其可视化。请注意，还有一个从 NETCDF 读取元数据的函数，它以字典格式返回此信息（请参阅“readInfo”函数）。

from gdal import Dataset, InfoOptions
from osgeo import gdal
import numpy as np


def read_data(filename):

    dataset = gdal.Open(filename)

    if not isinstance(dataset, Dataset):
        raise FileNotFoundError("Impossible to open the netcdf file")

    return dataset


def readInfo(ds, infoFormat="json"):
    "how to: https://gdal.org/python/"

    info = gdal.Info(ds, options=InfoOptions(format=infoFormat))

    return info


def listAllSubDataSets(infoDict: dict):

    subDatasetVariableKeys = [x for x in infoDict["metadata"]["SUBDATASETS"].keys()
                              if "_NAME" in x]

    subDatasetVariableNames = [infoDict["metadata"]["SUBDATASETS"][x]
                               for x in subDatasetVariableKeys]

    formatedsubDatasetVariableNames = []

    for x in subDatasetVariableNames:

        s = x.replace('"', '').split(":")[-1]
        s = ''.join(s)
        formatedsubDatasetVariableNames.append(s)

    return formatedsubDatasetVariableNames


if "__main__" == __name__:

    filename = "netcdfFile.nc"
    ds = read_data(filename)

    infoDict = readInfo(ds)

    infoDict["VariableNames"] = listAllSubDataSets(infoDict)

Answer 2

您还可以通过执行以下操作获得 netcdf 中可用子数据集或“变量名称”的概述：

ds = gdal.Open(filename)
subdatasets = ds.GetSubDatasets()

subdatasets

将是包含的元组列表； (1) 相应变量的路径，格式为

"NETCDF:filename:var"

和； (2) 有关变量的一些信息，例如维度大小和数据类型。

所以在那之后你可以像这样打开一个特定的变量（例如列表中的第一个）：

sub_ds = gdal.Open(subdatasets[0][0])

如何使用 GDAL 检索 netcdf 中的所有变量名

问题描述投票：0回答：2

2个回答

最新问题

如何使用 GDAL 检索 netcdf 中的所有变量名

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2