如何迭代所有 HDF5 文件并将它们保存为 csv 文件

问题描述 投票:0回答:1

我正在编写一个 python 代码,它将循环遍历我的 SMAP HDF5 (10,000) 文件。我想提取土壤水分根区和植被绿度。我的代码看起来像这样

`import os
import tables
import h5py
import datetime as dt
import glob
import h5py
import matplotlib.pyplot as mpl
import numpy as np
import os
import pandas as pd
import xarray as xr

direc = '/Users/24715447/Library/CloudStorage/OneDrive-UTS/Soil_Moisture/SMAP/HDF' # the working directory (where your files are stored)
dirs = os.listdir(direc)

for idir in dirs: # this will iterate over the files in your working directory

    if idir.endswith('.h5'): # only for HDF5 files...
        hdf5 = tables.open_file(os.path.join(direc,idir))

        dataset = h5py.File(hdf5, 'r')

        longitude_values = np.array(list(dataset['cell_lon'])).flatten()
        latitude_values = np.array(list(dataset['cell_lat'])).flatten()
        soilMoisture_values = np.array(list(geo['sm_rootzone'])).flatten()
        Vegetation_greenness_values = np.array(list(geo['vegetation_greenness_fraction'])).flatten()
        dataset = pd.DataFrame({"lon": longitude_values, "lat": latitude_values, "soil_root": soilMoisture_values, "greenness": Vegetation_greenness_values})

        dataset.to_csv('/Users/24715447/Library/CloudStorage/OneDrive-UTS/Soil_Moisture/SMAP/CSV_{idir}.csv')

        hdf5.close()`

我收到这样的错误

Traceback (most recent call last):
  File "/Users/24715447/Library/CloudStorage/OneDrive-UTS/Soil_Moisture/SMAP/h5tocsv.py", line 14, in <module>
    dirs = os.listdir(direc)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/24715447/Library/CloudStorage/OneDrive-UTS/Soil_Moisture/SMAP/HDF")'
(climate_process) UTS046280:SMAP 24715447$ python h5tocsv.py
Traceback (most recent call last):
  File "/Users/24715447/Library/CloudStorage/OneDrive-UTS/Soil_Moisture/SMAP/h5tocsv.py", line 14, in <module>
    dirs = os.listdir(direc)
FileNotFoundError: [Errno 2] No such file or directory: './HDF")'
(climate_process) UTS046280:SMAP 24715447$ python h5tocsv.py
Traceback (most recent call last):
  File "/Users/24715447/Library/CloudStorage/OneDrive-UTS/Soil_Moisture/SMAP/h5tocsv.py", line 19, in <module>
    hdf5 = tables.openFile(os.path.join(direc,idir))
AttributeError: module 'tables' has no attribute 'openFile'
(climate_process) UTS046280:SMAP 24715447$ python h5tocsv.py
Traceback (most recent call last):
  File "/Users/24715447/Library/CloudStorage/OneDrive-UTS/Soil_Moisture/SMAP/h5tocsv.py", line 19, in <module>
    hdf5 = tables.open_file(os.path.join(direc,idir))
  File "/Users/24715447/anaconda3/envs/climate_process/lib/python3.9/site-packages/tables/file.py", line 300, in open_file
    return File(filename, mode, title, root_uep, filters, **kwargs)
  File "/Users/24715447/anaconda3/envs/climate_process/lib/python3.9/site-packages/tables/file.py", line 750, in __init__
    self._g_new(filename, mode, **params)
  File "tables/hdf5extension.pyx", line 486, in tables.hdf5extension.File._g_new
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "H5F.c", line 620, in H5Fopen
    unable to open file
  File "H5VLcallback.c", line 3502, in H5VL_file_open
    failed to iterate over available VOL connector plugins
  File "H5PLpath.c", line 579, in H5PL__path_table_iterate
    can't iterate over plugins in plugin path '(null)'
  File "H5PLpath.c", line 620, in H5PL__path_table_iterate_process_path
    can't open directory: /Users/24715447/anaconda3/envs/climate_process/lib/hdf5/plugin
  File "H5VLcallback.c", line 3351, in H5VL__file_open
    open failed
  File "H5VLnative_file.c", line 97, in H5VL__native_file_open
    unable to open file
  File "H5Fint.c", line 1990, in H5F_open
    unable to read superblock
  File "H5Fsuper.c", line 617, in H5F__super_read
    truncated file: eof = 120927924, sblock->base_addr = 0, stored_eof = 150224576

End of HDF5 error back trace

我希望获得相同文件名的 csv 文件,仅包含土壤湿度和植被绿度。我还想将值剪辑到这些坐标

box_lat = [-43.63, -10.66] box_lon = [113.34, -153.57]

如何修改我的代码?请帮助我🙏

python hdf5
1个回答
0
投票

您遇到的错误消息表明文件路径和 HDF5 文件打开过程存在一些问题。

  1. 文件路径问题: 错误消息 FileNotFoundError: [Errno 2] No such file or directory 表示找不到您提供的目录。确保目录路径正确且可访问。您可能想打印 direc 变量并检查它是否指向正确的目录。

  2. HDF5 文件打开问题: 错误消息 AttributeError: module 'tables' has no attribute 'openFile' 可能是因为tables 模块中的 open_file 方法区分大小写。正确的方法名称是 open_file,而不是 openFile。更新打开 HDF5 文件的行,如下所示:

hdf5 = tables.open_file(os.path.join(direc, idir))
  1. 裁剪值: 要将值剪切到指定坐标,您可以在循环内添加一个条件,在将值保存到 CSV 文件之前检查纬度和经度是否落在指定范围内。您可以使用 numpy 的逻辑索引来实现此目的。修改循环如下:
box_lat = [-43.63, -10.66]
box_lon = [113.34, -153.57]

for idir in dirs:
    if idir.endswith('.h5'):
        hdf5 = tables.open_file(os.path.join(direc, idir))

        dataset = h5py.File(hdf5, 'r')

        longitude_values = np.array(list(dataset['cell_lon'])).flatten()
        latitude_values = np.array(list(dataset['cell_lat'])).flatten()
        soilMoisture_values = np.array(list(geo['sm_rootzone'])).flatten()
        Vegetation_greenness_values = np.array(list(geo['vegetation_greenness_fraction'])).flatten()

        # Filter the data to the specified coordinates
        mask = (box_lat[0] <= latitude_values) & (latitude_values <= box_lat[1]) & \
               (box_lon[0] <= longitude_values) & (longitude_values <= box_lon[1])

        # Create a DataFrame with the filtered data
        dataset = pd.DataFrame({
            "lon": longitude_values[mask],
            "lat": latitude_values[mask],
            "soil_root": soilMoisture_values[mask],
            "greenness": Vegetation_greenness_values[mask]
        })

        # Save to CSV file
        dataset.to_csv(f'/Users/24715447/Library/CloudStorage/OneDrive-UTS/Soil_Moisture/SMAP/CSV_{idir}.csv')

        hdf5.close()

希望这有帮助。

© www.soinside.com 2019 - 2024. All rights reserved.