我有一个以下格式的逗号分隔输入文件,我正在寻找一种相当简单/快速的方法来转换为 pandas 中的正常形状数据框。 csv 数据文件将所有数据堆积成两列,每个数据块由“空行”分隔,如下所示。
请注意,为了便于解释,我将三个块集的时间戳值设置为相同,但实际上它们可以不同:
Trace Name,SignalName1
Signal,<signal info>
Timestamp,Value
Trace Name,SignalName2
Signal,<signal info>
Timestamp,Value
Trace Name,SignalName3
Signal,<signal info>
Timestamp,Value
2023-10-04 15:36:43.757193 EDT,13
2023-10-04 15:36:43.829083 EDT,14
2023-10-04 15:36:43.895651 EDT,17
2023-10-04 15:36:43.931145 EDT,11
Trace Name,SignalName4
Signal,<signal info>
Timestamp,Value
2023-10-04 15:36:43.757193 EDT,131
2023-10-04 15:36:43.829083 EDT,238
2023-10-04 15:36:43.895651 EDT,413
2023-10-04 15:36:43.931145 EDT,689
Trace Name,SignalName5
Signal,<signal info>
Timestamp,Value
Trace Name,SignalName6
Signal,<signal info>
Timestamp,Value
2023-10-04 15:36:43.757193 EDT,9867
2023-10-04 15:36:43.829083 EDT,1257
2023-10-04 15:36:43.895651 EDT,5736
2023-10-04 15:36:43.931145 EDT,4935
Trace Name,SignalName7
Signal,<signal info>
Timestamp,Value
Trace Name,SignalName8
Signal,<signal info>
Timestamp,Value
重塑后所需的输出应如下所示:
Timestamp SignalName3 SignalName4 SignalName6
0 10/4/2023 15:36:43 13 131 9867
1 10/4/2023 15:36:43 14 238 1257
2 10/4/2023 15:36:43 17 413 5736
3 10/4/2023 15:36:43 11 689 4935
from io import StringIO
with open('trace.txt') as fp:
data = []
for row in fp:
if row.startswith('Trace'):
signal = row.split(',')[1].strip()
next(fp) # skip next row
buf = StringIO()
while True:
row = fp.readline()
if row.strip():
buf.write(row)
else:
buf.seek(0)
break
df = pd.read_csv(buf, header=0, names=['Timestamp', signal])
if not df.empty:
df['Timestamp'] = pd.to_datetime(df['Timestamp'].str[:-4])
data.append(df)
输出:
>>> data
[ Timestamp SignalName3
0 2023-10-04 15:36:43.757193 13
1 2023-10-04 15:36:43.829083 14
2 2023-10-04 15:36:43.895651 17
3 2023-10-04 15:36:43.931145 11,
Timestamp SignalName4
0 2023-10-04 15:36:43.757193 131
1 2023-10-04 15:36:43.829083 238
2 2023-10-04 15:36:43.895651 413
3 2023-10-04 15:36:43.931145 689,
Timestamp SignalName6
0 2023-10-04 15:36:43.757193 9867
1 2023-10-04 15:36:43.829083 1257
2 2023-10-04 15:36:43.895651 5736
3 2023-10-04 15:36:43.931145 4935]
但是,时间戳是相同的,我认为您的文件中情况并非如此,因此您应该提供更真实的数据示例。