将单个熊猫数据帧重复添加到h5文件中

Question

我有一个小脚本，旨在从用户输入目录中读取csv文件并将其转换为单个HDF5文件：

path = input('Insert the directory path:')

file_list = []
for file in glob.glob(path):
    file_list.append(file)


for filename in file_list:
    df = pd.read_csv(filename)
    key = Path(filename).resolve().stem
    with pd.HDFStore('test.h5') as store:
        store.append(key=key, value=df, format='table', data_columns=df.columns)

目前，这是将每个文件（以数据帧格式）作为一个组追加。如果我将其打开，它看起来像这样：

此外，如果我使用另一个目录再次运行脚本，它将继续向根组追加新组（每个文件一个）。

我想每次运行脚本时，都会将文件组附加到根目录下的新组（主题）中。像这样的东西：

我觉得这可能与传递给store.append的键im有关，因为现在它使用文件名作为键。我能够手动传递密钥并追加所需的数据帧，但这不是我想要的目标。

一些建议会很棒！谢谢

我有一个小脚本，旨在从用户输入目录中读取csv文件，并将其转换为单个HDF5文件：path = input（'Insert the directory path：'）file_list = [] for glob.glob（path ）...

Answer 1

import glob
import os
import pandas as pd

# inputs
path = input('Insert the directory path:')
group = input('Insert a group name: ')

# create a list of file paths
file_list = [file for file in glob.glob(path)]
# dict comprehension to create keys from file name and values from the csv files
dfs = {os.path.basename(os.path.normpath(filename)).split('.')[0]: pd.read_csv(filename) for filename in file_list}

# loop though the dataframes
for k,df in dfs.items():
    # store the HDF5 file
    store = pd.HDFStore('test.h5')
    # append df to a group and assign the key with f-strings
    store.append(f'{group}/{k}', df, format='table', data_columns=df.columns)
    # close the file
    store.close()

将单个熊猫数据帧重复添加到h5文件中

问题描述投票：1回答：1

1个回答

最新问题

将单个熊猫数据帧重复添加到h5文件中

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1