我想看看是否有一种有效的方法来重塑从制表符分隔的 csv 文件中读取的数据帧。数据由事件代码和消息组成,这些代码和消息与其时间戳一起垂直堆叠。数据包括一个状态列,用于指定事件是发生 (TRUE) 还是已清除 (FALSE)。我尝试迭代每一行并相应更新,但需要很长时间才能完成。
下面的示例显示了输入文件的格式:
Timestamp State Data EventCode EventMsg Class
19-May-2023 16:10:09.301 FALSE 1 EventCode 1 EventCode 1 message class 1
19-May-2023 16:10:09.300 FALSE 2 EventCode 2 EventCode 2 message class 1
19-May-2023 16:10:09.299 TRUE 3 EventCode 1 EventCode 1 message class 2
19-May-2023 16:10:09.298 FALSE 4 EventCode 4 EventCode 4 message class 2
19-May-2023 16:10:09.297 FALSE 5 EventCode 3 EventCode 3 message class 2
19-May-2023 16:10:09.296 TRUE 6 EventCode 2 EventCode 2 message class 1
19-May-2023 16:10:09.295 TRUE 7 EventCode 4 EventCode 4 message class 2
19-May-2023 16:10:09.294 TRUE 8 EventCode 3 EventCode 3 message class 2
下面显示了所需的最终格式:
OccurTimestamp clearTimestamp Data EventCode EventMsg Class
19-05-2023 16:10:09.299 19-05-2023 16:10:09.301 3 EventCode 1 EventCode 1 Message class 1
19-05-2023 16:10:09.296 19-05-2023 16:10:09.300 6 EventCode 2 EventCode 2 Message class 1
19-05-2023 16:10:09.295 19-05-2023 16:10:09.298 7 EventCode 4 EventCode 3 Message class 2
19-05-2023 16:10:09.294 19-05-2023 16:10:09.297 8 EventCode 3 EventCode 4 Message class 2
根据您的示例,我假设每个 EventCode 仅存在两次(一次与
State = True
一次与 state = False
一次)。那么这应该有效:
# sort so that values with state = True are first
# then group the values based on EventCode
# then only get the first rows for each group (those with state = True)
# then reset the index to get a normal dataframe back
# and rename the Timestamp column to OccurTimestamp
new_df = df.sort_values(by = 'State', ascending=True) \
.groupby('EventCode', group_keys=False) \
.first() \
.reset_index(drop=False) \
.rename(columns={'Timestamp': 'OccurTimestamp'})
# now we just need the corresponding clearTimetamps:
clear_timestamps = df[df['State'] == False][['Timestamp', 'EventCode']].rename(columns = {'Timestamp': 'clearTimestamp'})
# and merge both dataframes based on the (I assume unique) EventCode
final = pd.merge(new_df, clear_timestamps, on = 'EventCode')