Pandas df,使用第二个索引获取行号

问题描述 投票:0回答:2

我有一个多索引的 pandas df,需要查找特定索引的行号。 索引是纪元时间(时间戳列),因此存在具有相同索引的多行,我只需要找到其中之一。

df 看起来像这样:

                  Temp1  Temp2 Uptime  CPULoadAvg1  CPULoadAvg5  CPULoadAvg15 GPUV1 GPUV2 GPUV3 GPUV4                  ts
Board Timestamp                                                                                                          
src   1689884603   30.6   34.0      0         0.08         0.02          0.01     0     0     0   100 2023-07-20 20:22:24
      1689884604   30.8   30.0      0         0.08         0.02          0.01     0     0     0   100 2023-07-20 20:22:24
      1689884605   30.7   31.0      0         0.15         0.03          0.01     0     0     0   100 2023-07-20 20:22:24
      1689884606   30.7   30.0      0         0.15         0.03          0.01     0     0     0   100 2023-07-20 20:22:24
      1689884607   30.5   30.0      0         0.15         0.03          0.01     0     0     0   100 2023-07-20 20:22:24
...                 ...    ...    ...          ...          ...           ...   ...   ...   ...   ...                 ...
coms  1690214970   47.1   53.0   2:59         2.12         2.06          2.06   689   641   876    64 2023-07-24 16:08:32
lr2   1690214970   34.3   49.0   2:59         0.31         0.13          0.04     0     0     0   100 2023-07-24 16:08:32
pp    1690214970   38.5   40.0    NaN          NaN          NaN           NaN     0     0     0   100 2023-07-24 16:08:32
srs   1690214970   43.0   49.0    NaN          NaN          NaN           NaN   NaN   NaN   NaN   NaN 2023-07-24 16:08:32
vel   1690214970   37.4   34.0    NaN          NaN          NaN           NaN   NaN   NaN   NaN   NaN 2023-07-24 16:08:32

编辑: 所以我的任务是获取系统日志的最后 x 分钟以绘制温度和其他信息的图表。我过滤了系统日志并对数据进行了 df 处理。询问用户分钟数,然后得到以下代码:

how_far_back_to_go_sec = (int(how_far_back_to_go_mins) * 60)
first_row_timestamp = int(df.index.levels[2][0])
last_row_timestamp = int(df.index.levels[2][-1])

timestamp_to_go_to = last_row_timestamp - how_far_back_to_go_sec      # Get desired Timestamp to start graph at

if timestamp_to_go_to < first_row_timestamp:
    print("Error: The logs to not go that far back. (",timestamp_to_go_to," seconds)")
    exit(1)  # Exit the code with an error code of 1

# try 1
df_xs = df.xs(timestamp_to_go_to, level='Timestamp')
print(df_xs)

# try 2
df_loc = df.loc[(slice(None), timestamp_to_go_to),:]
print(df_loc)

这会返回一个 KeyError 。我已经检查了 df 和 timestamp_to_go_to 是一个有效的索引


添加了 print(df.to_dict()) 输出:

'39', 'src', '1690216116'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216116'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216116'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216122'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216122'): Timestamp('2023-07-24 16:27:44'), 

对于该运行,密钥是“1690216119”,但仍然给出密钥错误。


绘图代码:

for idn, adfs in adfs.groupby(['Board'], level=0):
    fig.add_trace(go.Scatter(x=adfs.index.get_level_values('Timestamp').values,
                             y=adfs['Temp1'],
                             mode='lines+markers',  # Display both lines and markers
                             name=adfs.index.get_level_values('Board').values[0],
                             line=dict(width=1)))

fig.update_layout(xaxis=dict(title="Time", tickformat='%H:%M:%S', type='date'),
                  yaxis=dict(title='Temperature (deg C)'),
                  title=figure_title)
# Show the plot
fig.show()
pandas dataframe multi-index
2个回答
0
投票

这个问题可能有助于访问第二级索引Python Pandas Accessing Values from Second Index in multi-indexed dataframe

从适合您的数据框的答案中进行选择:

df.xs(1690206352, level='Timestamp')

df.loc[(slice(None), 1690206352),:]

虽然我认为带有复杂索引的行号概念有点丢失。您可以添加显式行号列,并搜索特定时间戳的值,如下所示:

df['row_number'] = range(df.shape[0])
df.xs(1690206352, level='Timestamp').min()['row_number']

输出:

5

您还可以将

min()
更改为更适合您的任务的内容,以从时间戳为 1690206352 的多行中仅选择一行。


0
投票

所以我最终做了 df.reset_index() 来摆脱多重索引。 之后,我循环遍历每一行并搜索我想要的纪元时间值(搜索中的任何构建似乎都不起作用)。找到后,我将其余行附加到新的 df

# single index way to cut down the df
rows_to_append = []
time_found = False
for row in adfs.itertuples(index=False):
    if int(row[1]) == timestamp_to_go_to:                   
        time_found = True
    if time_found:
        rows_to_append.append(row)
if not time_found:
    print("Log time not found, try going back to another point")
    exit(1) # Exit cause value not found

# After the loop, reset the index of the new DataFrame
adfs = pd.DataFrame(rows_to_append)
adfs.reset_index(drop=True, inplace=True)
print(adfs)

虽然不漂亮,但很管用。

用图表表示:

for idn, adfs in adfs.groupby(['Board']):                                                       # single index
    fig.add_trace(go.Scatter(
                            x=adfs['Timestamp'],                                                # single index
                             y=adfs['Temp1'],
                             mode='lines+markers',  # Display both lines and markers
                             name=adfs['Board'].values[0],                                      # single index
                             line=dict(width=1)))

fig.update_layout(xaxis=dict(title="Time", tickformat='%H:%M:%S', tickmode='auto', nticks=20),
                  yaxis=dict(title='Temperature (deg C)'),
                  title=figure_title)
# Show the plot
fig.show()
© www.soinside.com 2019 - 2024. All rights reserved.