我有一个scipy
distance_matrix
作为dataframe
。
如何从数据帧中提取每行的最小值(不包括0.00)以及与该值相关的(行,列)标签?
例如:
第一行的min
为[0.012885,'king','boy']
第二行的min
将是[2.826742,'wise','bananas']
DataFrame
的代码:
import scipy
...
df = pd.DataFrame(scipy.spatial.distance_matrix(w2v_df[['x1', 'x2']],
w2v_df[['x1', 'x2']]),
index=w2v_df['word'],
columns=w2v_df['word'])
print(df)
print(df.size)
输出:
<class 'pandas.core.frame.DataFrame'>
word king wise queen ... kind man boy
word ...
king 0.000000 7.917140 10.963772 ... 5.811759 3.180582 0.012885
wise 7.917140 0.000000 6.642557 ... 10.990575 9.957878 7.908536
queen 10.963772 6.642557 0.000000 ... 10.347096 11.126121 10.951130
trees 9.954951 3.937842 2.917539 ... 10.940161 10.948519 9.943392
lab 7.437203 11.811392 10.148030 ... 1.716404 4.612150 7.429358
prince 3.180829 9.958469 11.126762 ... 2.897802 0.000654 3.177194
monkeys 10.007491 3.958035 2.926149 ... 10.995299 11.004550 9.995942
girl 5.820748 5.026462 5.153798 ... 6.336225 6.244742 5.808014
woman 10.663214 8.143587 2.350959 ... 8.843283 10.155728 10.650332
princess 5.204497 5.744348 5.894201 ... 5.439997 5.356606 5.191617
cat 3.033364 5.678351 10.397241 ... 8.359144 6.077646 3.031699
dinosaurs 5.745362 6.422390 5.683175 ... 5.075057 5.442950 5.732531
person 9.421978 10.901532 7.192433 ... 5.081030 7.477618 9.410744
bananas 5.238502 2.826742 8.147972 ... 9.239873 7.668165 5.231329
partner 7.752175 10.135952 7.572307 ... 3.468261 5.742199 7.741316
rat 8.830544 8.633246 4.739600 ... 6.113317 7.734904 8.818027
kind 5.811759 10.990575 10.347096 ... 0.000000 2.897668 5.804801
man 3.180582 9.957878 11.126121 ... 2.897668 0.000000 3.176944
boy 0.012885 7.908536 10.951130 ... 5.804801 3.176944 0.000000
[19 rows x 19 columns]
我已经尝试了以下操作(仍然需要附加关联的值):
df1 = df[all_results != 0]
df1.idxmin()
print(df1.idxmin())
输出:
word
trees monkeys
rat trees
person partner
monkeys trees
king boy
girl queen
princess woman
dinosaurs wise
lab kind
man prince
boy king
woman queen
prince man
wise dinosaurs
partner person
queen woman
bananas person
cat princess
kind lab
请注意,距离矩阵是对称的。因此您可以为每行每个示例仅使用dataframe.sort_value(by='king')
。并带.iloc[:,1]
。或者,您可以只使用min函数并将其存储在列表中。
我这样做了,对于看起来像您的小数据框来说效果很好。
df = df.replace(0,99999) /// # OR df.replace(0,999,inplace = True)
#get the min for per example the king
min_king = df.king.min()
[min_king,'king', df[df['king']==min_king].index.values[0]]
然后在块上循环以获取所有索引