我正在编写一个程序来为我的数据集中的每个数据点获取3个最近的邻居。我的数据集具有47个要素,其中包含5000行条目,并且没有目标变量。我正在使用here将我的整个数据集都设置为一个numpy数组。我正在开发以下代码,但似乎被卡住了:
X = df.to_numpy()
from sklearn.neighbors import NearestNeighbors
def findsuccess(id):
nbrs = NearestNeighbors(n_neighbors=3)
nbrs.fit(X)
pred = nbrs.kneighbors(X,3)
for i in pred:
print "What should come here" ?? - I need to print my 3 neighbours here at this step
发布此信息,我想使用findsuccess
函数传递要为其寻找邻居的ID,并在excel中打印此列表。例如:
然后函数调用:# findsuccess(1234)
最终目标是从我的数据集中查看id
1234中最近的3个邻居。最终目标是在这样的输出文件中打印这三个邻居:
id Neigh1 Neigh2 Neigh3
1234 1334 1444 1555
我认为kneighbors_graph
的NearestNeighbors
方法适合您。它返回1(如果特定点是k最近邻)或0矩阵。
>>> X = [[0], [3], [1]]
>>> from sklearn.neighbors import NearestNeighbors
>>> neigh = NearestNeighbors(n_neighbors=2)
>>> neigh.fit(X)
NearestNeighbors(n_neighbors=2)
>>> A = neigh.kneighbors_graph(X)
>>> A.toarray()
array([[1., 0., 1.],
[0., 1., 1.],
[1., 0., 1.]])
您可以执行以下操作:
from sklearn.neighbors import NearestNeighbors
X = df.to_numpy()
nbrs = NearestNeighbors(n_neighbors=3)
nbrs.fit(X)
def findsuccess(id):
i = <get the index of id>
neighbors_f_id = np.where(nbrs.kneighbors_graph(X[i]))[0]
print ('Neighbors of id', neighbors_f_id)