我是 Python 编码新手。我想将 2 个 H5 文件中的数据合并到一个主 H5 文件中。我的目标是将每个源文件中
SRRXX/SRR630/*
组中的所有对象(列表 h5_files
中的文件名)添加到主(目标)文件 (main_h5_path
)。下面的代码是我尝试这样做的。当我跑步时,我得到这个异常:
Error occurred during H5 merging: 'Group' object has no attribute 'encode'
我也尝试了
create_group()
,但得到了同样的例外。
我需要修改什么才能让我的代码正常工作?
#read the mainfile dataset
with h5py.File(main_h5_path, 'r') as h5_main_file_obj:
# return if H5 doesn't contain any data
if len(h5_main_file_obj.keys()) == 0:
return
main_file_timestamp_dtset_obj = h5_main_file_obj['/' + 'SRR6XX' + '/' + 'SRR630']
for file in h5_files:
with h5py.File(file, 'r') as h5_sub_file_obj:
# return if H5 doesn't contain any data
if len(h5_sub_file_obj.keys()) == 0:
continue
sub_file_timestamp_dtset_obj = h5_sub_file_obj['/' + 'SRR6XX' + '/' + 'SRR630']
# h5_main_file_obj.create_dataset(sub_file_timestamp_dtset_obj)
for ts_key in sub_file_timestamp_dtset_obj.keys():
print('ts_key', ts_key)
each_ts_ds = h5_sub_file_obj['/' + 'SRR6XX' + '/' + 'SRR630' + '/' + str(ts_key) + '/']
h5_main_file_obj.create_dataset(each_ts_ds)
except (IOError, OSError, Exception) as e:
print(f"Error occurred during H5 merging: {e}")
return -1
return 0
我的原始答案仅将源文件中组'/SRR6XX/SRR630
'下的
组名称复制到主(目标)文件。 OP 评论说他们想要“复制组名称及其数据集”。 我更新了我的答案以反映该请求。只需要更改 1 行。 (作为参考,创建组的行已被注释掉。)
以下是使其正常工作所需的对原始代码的更改:
ts_key
是对象名称(而不是对象)。使用 .items() 获取名称和对象(或仅按名称引用对象)。main_file_timestamp_dtset_obj
)修改后的代码如下:
def your_function:
with h5py.File(main_h5_path, 'a') as h5_main_file_obj: # need Append mode to add groups
# return if H5 doesn't contain any data
if len(h5_main_file_obj.keys()) == 0:
return
main_file_timestamp_dtset_obj = h5_main_file_obj['/SRR6XX/SRR630']
for file in h5_files:
with h5py.File(file, 'r') as h5_sub_file_obj:
# return if H5 doesn't contain any data
if len(h5_sub_file_obj.keys()) == 0:
continue
sub_file_timestamp_dtset_obj = h5_sub_file_obj['/SRR6XX/SRR630']
# h5_main_file_obj.create_dataset(sub_file_timestamp_dtset_obj)
for ts_key in sub_file_timestamp_dtset_obj.keys():
print('ts_key:', ts_key)
# This only creates group:
#main_file_timestamp_dtset_obj.create_group(ts_key)
# This copies Group and its objects (groups or datasets):
grp_path = 'SRR6XX/SRR630/' + ts_key
h5_sub_file_obj.copy(h5_sub_file_obj[grp_path], main_file_timestamp_dtset_obj)
我编写了另一个更紧凑的解决方案,并在复制之前检查源对象是否为组。见下文。另一项需要考虑的检查:在复制每个组之前与主(目标)文件中的现有组名称冲突。正如我的评论中所述,请考虑使用外部链接以避免重复数据。
def my_function():
with h5py.File(main_h5_path, mode='a') as h5ft:
if len(h5ft.keys()) == 0:
return
for h5_source in h5_files:
with h5py.File(h5_source,'r') as h5fs:
if len(h5ft.keys()) == 0:
continue
for grp_name, h5_obj in h5fs['SRR6XX/SRR630'].items():
if isinstance(h5_obj,h5py.Group):
# This only creates group:
#h5ft['SRR6XX/SRR630'].create_group(grp_name)
# This copies Group and its objects (groups or datasets):
grp_path = 'SRR6XX/SRR630/' + grp_name
h5fs.copy(h5fs[grp_path], h5ft['SRR6XX/SRR630'])