在计算第二个最小值时,如果所选列中的值大于“Col7”列中的值,如何将每个项目的第二个最小值添加到 df 中?
import pandas as pd
my_dict={'Item1':['Col1','Col3','Col6'],
'Item2':['Col2','Col4','Col6','Col8'],
'Item3':['Col1','Col3','Col6']
}
df=pd.DataFrame({
'Col0':['Item1','Item2','Item3'],
'Col1':[20,25,28],
'Col2':[89,15,35],
'Col3':[36,30,96],
'Col4':[40,108,13],
'Col5':[55,2,9],
'Col6':[35,38,27],
'Col7':[30,20,39],
})
结果应该是:
df=pd.DataFrame({
'Col0':['Item1','Item2','Item3'],
'Col1':[20,25,28],
'Col2':[89,15,35],
'Col3':[36,30,96],
'Col4':[40,108,13],
'Col5':[55,2,9],
'Col6':[35,38,27],
'Col7':[30,20,39],
'second min':[36,108,'NaN']
})
您可以通过迭代字典项,根据字典值选择列,应用条件过滤掉大于“Col7”中的值,然后找到每行的第二个最小值来实现此目的:
import pandas as pd
import numpy as np
my_dict = {
'Item1': ['Col1', 'Col3', 'Col6'],
'Item2': ['Col2', 'Col4', 'Col6', 'Col8'],
'Item3': ['Col1', 'Col3', 'Col6']
}
df = pd.DataFrame({
'Col0': ['Item1', 'Item2', 'Item3'],
'Col1': [20, 25, 28],
'Col2': [89, 15, 35],
'Col3': [36, 30, 96],
'Col4': [40, 108, 13],
'Col5': [55, 2, 9],
'Col6': [35, 38, 27],
'Col7': [30, 20, 39],
})
second_min_values = []
for item, cols in my_dict.items():
selected_cols = [col for col in cols if col in df.columns and col != 'Col7']
selected_values = df.loc[df['Col0'] == item, selected_cols].values.flatten()
selected_values = [val for val in selected_values if val > df.loc[df['Col0'] == item, 'Col7'].values[0]]
if len(selected_values) < 2:
second_min_values.append('NaN')
else:
second_min_values.append(np.partition(selected_values, 1)[1])
df['second min'] = second_min_values
print(df)
groupby.apply
中使用,并使用 numpy.partition
获得第二大值:
def get_nth(g, N=2):
tmp = g.reindex(columns=my_dict.get(g.name))
return pd.Series(np.partition(tmp.where(tmp.ge(g['Col7'], axis=0)),
N-1, axis=1)[:, N-1], index=g.index)
df['second min'] = (df.groupby('Col0', group_keys=False)
.apply(get_nth, include_groups=False)
)
输出:
Col0 Col1 Col2 Col3 Col4 Col5 Col6 Col7 second min
0 Item1 20 89 36 40 55 35 30 36.0
1 Item2 25 15 30 108 2 38 20 108.0
2 Item3 28 35 96 13 9 27 39 NaN