使用 pandas 重塑 csv 文件中的堆叠表单数据

问题描述 投票:0回答:1

我有一个以下格式的逗号分隔输入文件,我正在寻找一种相当简单/快速的方法来转换为 pandas 中的正常形状数据框。 csv 数据文件将所有数据堆积成两列,每个数据块由“空行”分隔,如下所示。 请注意,为了便于解释,我将三个块集的时间戳值设置为相同,但实际上它们可以不同 Trace Name,SignalName1 Signal,<signal info> Timestamp,Value Trace Name,SignalName2 Signal,<signal info> Timestamp,Value Trace Name,SignalName3 Signal,<signal info> Timestamp,Value 2023-10-04 15:36:43.757193 EDT,13 2023-10-04 15:36:43.829083 EDT,14 2023-10-04 15:36:43.895651 EDT,17 2023-10-04 15:36:43.931145 EDT,11 Trace Name,SignalName4 Signal,<signal info> Timestamp,Value 2023-10-04 15:36:43.757193 EDT,131 2023-10-04 15:36:43.829083 EDT,238 2023-10-04 15:36:43.895651 EDT,413 2023-10-04 15:36:43.931145 EDT,689 Trace Name,SignalName5 Signal,<signal info> Timestamp,Value Trace Name,SignalName6 Signal,<signal info> Timestamp,Value 2023-10-04 15:36:43.757193 EDT,9867 2023-10-04 15:36:43.829083 EDT,1257 2023-10-04 15:36:43.895651 EDT,5736 2023-10-04 15:36:43.931145 EDT,4935 Trace Name,SignalName7 Signal,<signal info> Timestamp,Value Trace Name,SignalName8 Signal,<signal info> Timestamp,Value

重塑后所需的输出应如下所示:

Timestamp SignalName3 SignalName4 SignalName6 0 10/4/2023 15:36:43 13 131 9867 1 10/4/2023 15:36:43 14 238 1257 2 10/4/2023 15:36:43 17 413 5736 3 10/4/2023 15:36:43 11 689 4935


pandas dataframe csv io python-3.11
1个回答
0
投票

from io import StringIO with open('trace.txt') as fp: data = [] for row in fp: if row.startswith('Trace'): signal = row.split(',')[1].strip() next(fp) # skip next row buf = StringIO() while True: row = fp.readline() if row.strip(): buf.write(row) else: buf.seek(0) break df = pd.read_csv(buf, header=0, names=['Timestamp', signal]) if not df.empty: df['Timestamp'] = pd.to_datetime(df['Timestamp'].str[:-4]) data.append(df)

输出:

>>> data [ Timestamp SignalName3 0 2023-10-04 15:36:43.757193 13 1 2023-10-04 15:36:43.829083 14 2 2023-10-04 15:36:43.895651 17 3 2023-10-04 15:36:43.931145 11, Timestamp SignalName4 0 2023-10-04 15:36:43.757193 131 1 2023-10-04 15:36:43.829083 238 2 2023-10-04 15:36:43.895651 413 3 2023-10-04 15:36:43.931145 689, Timestamp SignalName6 0 2023-10-04 15:36:43.757193 9867 1 2023-10-04 15:36:43.829083 1257 2 2023-10-04 15:36:43.895651 5736 3 2023-10-04 15:36:43.931145 4935]

但是,时间戳是相同的,我认为您的文件中情况并非如此,因此您应该提供更真实的数据示例。

© www.soinside.com 2019 - 2024. All rights reserved.