我有一个具有以下格式的 csv 输入文件,我正在寻找一种相当简单的方法来转换为 pandas 中的正常形状数据框。 csv 数据文件将所有数据堆积成两列,每个数据块由“空行”分隔,如下所示。
请注意,为了便于解释,我将三个块集的时间戳值设置为相同,但实际上它们可以不同:
Trace Name SignalName1
Signal <Signal Info>
Timestamp Value
2023-10-04 15:36:43 13
2023-10-04 15:36:43 14
2023-10-04 15:36:43 17
2023-10-04 15:36:43 11
Trace Name SignalName2
Signal <Signal Info>
Timestamp Value
2023-10-04 15:36:43 131
2023-10-04 15:36:43 238
2023-10-04 15:36:43 413
2023-10-04 15:36:43 689
Trace Name SignalName3
Signal <Signal Info>
Timestamp Value
2023-10-04 15:36:43 9867
2023-10-04 15:36:43 1257
2023-10-04 15:36:43 5736
2023-10-04 15:36:43 4935
重塑后所需的输出应如下所示:
Timestamp SignalName1 SignalName2 SignalName3
10/4/2023 15:36:43 13 131 9867
10/4/2023 15:36:43 14 238 1257
10/4/2023 15:36:43 17 413 5736
10/4/2023 15:36:43 11 689 4935
和
concat
:
import re
import io
with open('csv_file.csv') as f:
out = pd.concat([pd.read_csv(io.StringIO(chunk), sep='\s\s+',
engine='python',
header=0, skiprows=[1,2])
.set_index('Trace Name')
for chunk in re.split('\n\n+', f.read())
if chunk],
axis=1).reset_index()
输出:
Trace Name SignalName1 SignalName2 SignalName3
0 2023-10-04 15:36:43 13 131 9867
1 2023-10-04 15:36:43 14 238 1257
2 2023-10-04 15:36:43 17 413 5736
3 2023-10-04 15:36:43 11 689 4935