我有一个多索引的 pandas df,需要查找特定索引的行号。 索引是纪元时间(时间戳列),因此存在具有相同索引的多行,我只需要找到其中之一。
df 看起来像这样:
Temp1 Temp2 Uptime CPULoadAvg1 CPULoadAvg5 CPULoadAvg15 GPUV1 GPUV2 GPUV3 GPUV4 ts
Board Timestamp
src 1689884603 30.6 34.0 0 0.08 0.02 0.01 0 0 0 100 2023-07-20 20:22:24
1689884604 30.8 30.0 0 0.08 0.02 0.01 0 0 0 100 2023-07-20 20:22:24
1689884605 30.7 31.0 0 0.15 0.03 0.01 0 0 0 100 2023-07-20 20:22:24
1689884606 30.7 30.0 0 0.15 0.03 0.01 0 0 0 100 2023-07-20 20:22:24
1689884607 30.5 30.0 0 0.15 0.03 0.01 0 0 0 100 2023-07-20 20:22:24
... ... ... ... ... ... ... ... ... ... ... ...
coms 1690214970 47.1 53.0 2:59 2.12 2.06 2.06 689 641 876 64 2023-07-24 16:08:32
lr2 1690214970 34.3 49.0 2:59 0.31 0.13 0.04 0 0 0 100 2023-07-24 16:08:32
pp 1690214970 38.5 40.0 NaN NaN NaN NaN 0 0 0 100 2023-07-24 16:08:32
srs 1690214970 43.0 49.0 NaN NaN NaN NaN NaN NaN NaN NaN 2023-07-24 16:08:32
vel 1690214970 37.4 34.0 NaN NaN NaN NaN NaN NaN NaN NaN 2023-07-24 16:08:32
编辑: 所以我的任务是获取系统日志的最后 x 分钟以绘制温度和其他信息的图表。我过滤了系统日志并对数据进行了 df 处理。询问用户分钟数,然后得到以下代码:
how_far_back_to_go_sec = (int(how_far_back_to_go_mins) * 60)
first_row_timestamp = int(df.index.levels[2][0])
last_row_timestamp = int(df.index.levels[2][-1])
timestamp_to_go_to = last_row_timestamp - how_far_back_to_go_sec # Get desired Timestamp to start graph at
if timestamp_to_go_to < first_row_timestamp:
print("Error: The logs to not go that far back. (",timestamp_to_go_to," seconds)")
exit(1) # Exit the code with an error code of 1
# try 1
df_xs = df.xs(timestamp_to_go_to, level='Timestamp')
print(df_xs)
# try 2
df_loc = df.loc[(slice(None), timestamp_to_go_to),:]
print(df_loc)
这会返回一个 KeyError 。我已经检查了 df 和 timestamp_to_go_to 是一个有效的索引
添加了 print(df.to_dict()) 输出:
'39', 'src', '1690216116'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216116'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216116'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216122'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216122'): Timestamp('2023-07-24 16:27:44'),
对于该运行,密钥是“1690216119”,但仍然给出密钥错误。
绘图代码:
for idn, adfs in adfs.groupby(['Board'], level=0):
fig.add_trace(go.Scatter(x=adfs.index.get_level_values('Timestamp').values,
y=adfs['Temp1'],
mode='lines+markers', # Display both lines and markers
name=adfs.index.get_level_values('Board').values[0],
line=dict(width=1)))
fig.update_layout(xaxis=dict(title="Time", tickformat='%H:%M:%S', type='date'),
yaxis=dict(title='Temperature (deg C)'),
title=figure_title)
# Show the plot
fig.show()
这个问题可能有助于访问第二级索引Python Pandas Accessing Values from Second Index in multi-indexed dataframe
从适合您的数据框的答案中进行选择:
df.xs(1690206352, level='Timestamp')
df.loc[(slice(None), 1690206352),:]
虽然我认为带有复杂索引的行号概念有点丢失。您可以添加显式行号列,并搜索特定时间戳的值,如下所示:
df['row_number'] = range(df.shape[0])
df.xs(1690206352, level='Timestamp').min()['row_number']
输出:
5
您还可以将
min()
更改为更适合您的任务的内容,以从时间戳为 1690206352 的多行中仅选择一行。
所以我最终做了 df.reset_index() 来摆脱多重索引。 之后,我循环遍历每一行并搜索我想要的纪元时间值(搜索中的任何构建似乎都不起作用)。找到后,我将其余行附加到新的 df
# single index way to cut down the df
rows_to_append = []
time_found = False
for row in adfs.itertuples(index=False):
if int(row[1]) == timestamp_to_go_to:
time_found = True
if time_found:
rows_to_append.append(row)
if not time_found:
print("Log time not found, try going back to another point")
exit(1) # Exit cause value not found
# After the loop, reset the index of the new DataFrame
adfs = pd.DataFrame(rows_to_append)
adfs.reset_index(drop=True, inplace=True)
print(adfs)
虽然不漂亮,但很管用。
用图表表示:
for idn, adfs in adfs.groupby(['Board']): # single index
fig.add_trace(go.Scatter(
x=adfs['Timestamp'], # single index
y=adfs['Temp1'],
mode='lines+markers', # Display both lines and markers
name=adfs['Board'].values[0], # single index
line=dict(width=1)))
fig.update_layout(xaxis=dict(title="Time", tickformat='%H:%M:%S', tickmode='auto', nticks=20),
yaxis=dict(title='Temperature (deg C)'),
title=figure_title)
# Show the plot
fig.show()