如何使用Python和h5py将多个H5合并为一个H5文件？ [已关闭]

Question

我是 Python 编码新手。我想将 2 个 H5 文件中的数据合并到一个主 H5 文件中。我的目标是将每个源文件中

SRRXX/SRR630/*

组中的所有对象（列表

h5_files

中的文件名）添加到主（目标）文件 (

main_h5_path

)。下面的代码是我尝试这样做的。当我跑步时，我得到这个异常：

Error occurred during H5 merging: 'Group' object has no attribute 'encode'

我也尝试了

create_group()

，但得到了同样的例外。

我需要修改什么才能让我的代码正常工作？

#read the mainfile dataset
        with h5py.File(main_h5_path, 'r') as h5_main_file_obj:
            # return if H5 doesn't contain any data
            if len(h5_main_file_obj.keys()) == 0:
                return
            main_file_timestamp_dtset_obj = h5_main_file_obj['/' + 'SRR6XX' + '/' + 'SRR630']

            for file in h5_files:
                with h5py.File(file, 'r') as h5_sub_file_obj:
                    # return if H5 doesn't contain any data
                    if len(h5_sub_file_obj.keys()) == 0:
                        continue
                    sub_file_timestamp_dtset_obj = h5_sub_file_obj['/' + 'SRR6XX' + '/' + 'SRR630']
                    # h5_main_file_obj.create_dataset(sub_file_timestamp_dtset_obj)
                    for ts_key in sub_file_timestamp_dtset_obj.keys():
                        print('ts_key', ts_key)
                        each_ts_ds = h5_sub_file_obj['/' + 'SRR6XX' + '/' + 'SRR630' + '/' + str(ts_key) + '/']
                        h5_main_file_obj.create_dataset(each_ts_ds)


    except (IOError, OSError, Exception) as e:
        print(f"Error occurred during H5 merging: {e}")
        return -1
    return 0

Answer 1

我的原始答案仅将源文件中组'/SRR6XX/SRR630'下的

组名称

复制到主（目标）文件。 OP 评论说他们想要“复制组名称及其数据集”。我更新了我的答案以反映该请求。只需要更改 1 行。（作为参考，创建组的行已被注释掉。）

以下是使其正常工作所需的对原始代码的更改：

主（目标）文件必须以附加模式打开才能添加新对象。
```
ts_key
```
是对象名称（而不是对象）。使用 .items() 获取名称和对象（或仅按名称引用对象）。
您正在根级别的主（目标）文件中创建新对象。您需要修改以引用适当的组对象(
```
main_file_timestamp_dtset_obj
```
)

修改后的代码如下：

def your_function:

  with h5py.File(main_h5_path, 'a') as h5_main_file_obj: # need Append mode to add groups
    # return if H5 doesn't contain any data
    if len(h5_main_file_obj.keys()) == 0:
        return
    main_file_timestamp_dtset_obj = h5_main_file_obj['/SRR6XX/SRR630']

    for file in h5_files:
        with h5py.File(file, 'r') as h5_sub_file_obj:
            # return if H5 doesn't contain any data
            if len(h5_sub_file_obj.keys()) == 0:
                continue
            sub_file_timestamp_dtset_obj = h5_sub_file_obj['/SRR6XX/SRR630']
            # h5_main_file_obj.create_dataset(sub_file_timestamp_dtset_obj)
            for ts_key in sub_file_timestamp_dtset_obj.keys():
                print('ts_key:', ts_key)
                # This only creates group:
                #main_file_timestamp_dtset_obj.create_group(ts_key)
                # This copies Group and its objects (groups or datasets):
                grp_path = 'SRR6XX/SRR630/' + ts_key
                h5_sub_file_obj.copy(h5_sub_file_obj[grp_path], main_file_timestamp_dtset_obj)

我编写了另一个更紧凑的解决方案，并在复制之前检查源对象是否为组。见下文。另一项需要考虑的检查：在复制每个组之前与主（目标）文件中的现有组名称冲突。正如我的评论中所述，请考虑使用外部链接以避免重复数据。

def my_function():
      
    with h5py.File(main_h5_path, mode='a') as h5ft:
        if len(h5ft.keys()) == 0:
            return
        for h5_source in h5_files:
            with h5py.File(h5_source,'r') as h5fs:
                if len(h5ft.keys()) == 0:
                    continue
                for grp_name, h5_obj in h5fs['SRR6XX/SRR630'].items(): 
                    if isinstance(h5_obj,h5py.Group):
                        # This only creates group:
                        #h5ft['SRR6XX/SRR630'].create_group(grp_name) 
                        # This copies Group and its objects (groups or datasets):
                        grp_path = 'SRR6XX/SRR630/' + grp_name
                        h5fs.copy(h5fs[grp_path], h5ft['SRR6XX/SRR630'])

如何使用Python和h5py将多个H5合并为一个H5文件？ [已关闭]

问题描述投票：0回答：1

1个回答

最新问题

如何使用Python和h5py将多个H5合并为一个H5文件？ [已关闭]

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1