根据事件发生和清除重塑数据框

问题描述 投票:0回答:1

我想看看是否有一种有效的方法来重塑从制表符分隔的 csv 文件中读取的数据帧。数据由事件代码和消息组成,这些代码和消息与其时间戳一起垂直堆叠。数据包括一个状态列,用于指定事件是发生 (TRUE) 还是已清除 (FALSE)。我尝试迭代每一行并相应更新,但需要很长时间才能完成。

下面的示例显示了输入文件的格式:

Timestamp                   State   Data    EventCode   EventMsg            Class
19-May-2023 16:10:09.301    FALSE   1       EventCode 1 EventCode 1 message class 1
19-May-2023 16:10:09.300    FALSE   2       EventCode 2 EventCode 2 message class 1
19-May-2023 16:10:09.299    TRUE    3       EventCode 1 EventCode 1 message class 2
19-May-2023 16:10:09.298    FALSE   4       EventCode 4 EventCode 4 message class 2
19-May-2023 16:10:09.297    FALSE   5       EventCode 3 EventCode 3 message class 2
19-May-2023 16:10:09.296    TRUE    6       EventCode 2 EventCode 2 message class 1
19-May-2023 16:10:09.295    TRUE    7       EventCode 4 EventCode 4 message class 2
19-May-2023 16:10:09.294    TRUE    8       EventCode 3 EventCode 3 message class 2

下面显示了所需的最终格式:

OccurTimestamp              clearTimestamp          Data    EventCode   EventMsg            Class
19-05-2023 16:10:09.299     19-05-2023 16:10:09.301 3       EventCode 1 EventCode 1 Message class 1
19-05-2023 16:10:09.296     19-05-2023 16:10:09.300 6       EventCode 2 EventCode 2 Message class 1
19-05-2023 16:10:09.295     19-05-2023 16:10:09.298 7       EventCode 4 EventCode 3 Message class 2
19-05-2023 16:10:09.294     19-05-2023 16:10:09.297 8       EventCode 3 EventCode 4 Message class 2
python dataframe csv io reshape
1个回答
0
投票

根据您的示例,我假设每个 EventCode 仅存在两次(一次与

State = True
一次与
state = False
一次)。那么这应该有效:

# sort so that values with state = True are first
# then group the values based on EventCode
# then only get the first rows for each group (those with state = True)
# then reset the index to get a normal dataframe back
# and rename the Timestamp column to OccurTimestamp
new_df = df.sort_values(by = 'State', ascending=True) \
           .groupby('EventCode', group_keys=False) \
           .first() \
           .reset_index(drop=False) \
           .rename(columns={'Timestamp': 'OccurTimestamp'})

# now we just need the corresponding clearTimetamps:
clear_timestamps = df[df['State'] == False][['Timestamp', 'EventCode']].rename(columns = {'Timestamp': 'clearTimestamp'})

# and merge both dataframes based on the (I assume unique) EventCode
final = pd.merge(new_df, clear_timestamps, on = 'EventCode')
© www.soinside.com 2019 - 2024. All rights reserved.