Pandas DataFrame，获取行中的 3 个最大值及其列名称

Question

论坛上有很多例子，如何找到对应列名的行的最大值。一些例子是这里或这里

我想做的是对上面的例子进行一些具体的修改。我的数据框看起来像这样，其中所有列都是从左到右编号的（这个顺序非常重要）：

x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10
  0   0   1   2   2   0   0   0   0    0
  4   4   0   4   4   1   0   0   0    0
  0   0   1   2   3   0   0   0   0    0

现在，我想在每行末尾创建 6 个新列，其中包含列名称和行中的最大值。

x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10 Max1 ValMax1 Max2 ValMax2 Max3 ValMax3
  0   0   1   2   2   0   0   0   0    0
  4   4   0   4   4   1   0   0   0    0
  0   0   1   2   3   0   0   0   0    0

如果某行最多只有 1 个最大值（例如第一行中的值 2），我想在 Max1 列中保存仅一个具有最小索引的列名称。在这种情况下，第二大值也是2，但相应的列有更大的索引。这意味着，需要在“Max(y)”列仅保存一个列名称。这是主要条件。在这种情况下，如果某行有超过 3 个最大值，则只需保存索引最小的 3 个列名。所以最终的输出应该是这样的 DF:

x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10 Max1 ValMax1 Max2 ValMax2 Max3 ValMax3
  0   0   1   2   2   0   0   0   0    0  x_4       2  x_5       2  x_3       1
  4   4   0   4   4   1   0   0   0    0  x_1       4  x_2       4  x_4       4
  0   0   1   2   3   0   0   0   0    0  x_5       3  x_4       2  x_3       1

总结一下，我们得到了下一个结果：在第一行 4 < 5, it means 4 comes first (anyway the second 2 comes immediately in the next column). in the second row 1 < 2 < 4 < 5, we have only 3 columns, so 5 is missing in the final result. in the third row, indices don't play any role, because we have strictly different values in the row. This is also the main condition.

Answer 1

使用以下代码块，它首先创建数据帧的副本

df_copy

，其中列名称替换为相应的数字索引（因为顺序很重要，正如您提到的）。然后，它对每一行应用一个函数来获取前 3 个最大值的索引。然后将这些索引映射回原始列名称。最后，它获取这些列的值，当然这些列会按预期重新排序。

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'x_1': [0, 4, 0],
    'x_2': [0, 4, 0],
    'x_3': [1, 0, 1],
    'x_4': [2, 4, 2],
    'x_5': [2, 4, 3],
    'x_6': [0, 1, 0],
    'x_7': [0, 0, 0],
    'x_8': [0, 0, 0],
    'x_9': [0, 0, 0],
    'x_10': [0, 0, 0]
})

# Create a copy of the dataframe and replace column names with their corresponding numeric index
df_copy = df.copy()
df_copy.columns = np.arange(len(df.columns))

# Apply a function to each row (axis=1) to get the indices of the top 3 max values
df[['Max1', 'Max2', 'Max3']] = df_copy.apply(lambda row: row.nlargest(3).index, axis=1, result_type='expand')

# Map the numeric indices back to column names
df[['Max1', 'Max2', 'Max3']] = df[['Max1', 'Max2', 'Max3']].applymap(lambda x: df.columns[int(x)])

# Get the values
df[['ValMax1', 'ValMax2', 'ValMax3']] = df.apply(lambda row: [row[row['Max1']], row[row['Max2']], row[row['Max3']]], axis=1, result_type='expand')

# Reorder the columns
column_order = ['x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'x_6', 'x_7', 'x_8', 'x_9', 'x_10', 'Max1', 'ValMax1', 'Max2', 'ValMax2', 'Max3', 'ValMax3']
df = df[column_order]
df

结果（如预期）：

x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10 Max1 ValMax1 Max2 ValMax2 Max3 ValMax3
  0   0   1   2   2   0   0   0   0    0  x_4       2  x_5       2  x_3       1
  4   4   0   4   4   1   0   0   0    0  x_1       4  x_2       4  x_4       4
  0   0   1   2   3   0   0   0   0    0  x_5       3  x_4       2  x_3       1

Pandas DataFrame，获取行中的 3 个最大值及其列名称

问题描述投票：0回答：1

1个回答

最新问题

Pandas DataFrame，获取行中的 3 个最大值及其列名称

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1