如何调整此代码以返回第二和第三个“最近的邻居”?

问题描述 投票:0回答:1

基于calculating average distance of nearest neighbours in pandas dataframe中的代码,如何调整它,以使第二个和第三个最近的邻居返回新列? (或创建一个可调参数来定义要返回的邻居数):

import numpy as np 
from sklearn.neighbors import NearestNeighbors
import pandas as pd

def nn(x):
    nbrs = NearestNeighbors(n_neighbors=2, algorithm='auto', metric='euclidean').fit(x)
    distances, indices = nbrs.kneighbors(x)
    return distances, indices

time = [0, 0, 0, 1, 1, 2, 2]
x = [216, 218, 217, 280, 290, 130, 132]
y = [13, 12, 12, 110, 109, 3, 56] 
car = [1, 2, 3, 1, 3, 4, 5]
df = pd.DataFrame({'time': time, 'x': x, 'y': y, 'car': car})

#This has the index of the nearest neighbor in the group, as well as the distance
nns = df.drop('car', 1).groupby('time').apply(lambda x: nn(x.as_matrix()))

groups = df.groupby('time')
nn_rows = []
for i, nn_set in enumerate(nns):
    group = groups.get_group(i)
    for j, tup in enumerate(zip(nn_set[0], nn_set[1])):
        nn_rows.append({'time': i,
                    'car': group.iloc[j]['car'],
                    'nearest_neighbour': group.iloc[tup[1][1]]['car'],
                    'euclidean_distance': tup[0][1]})

nn_df = pd.DataFrame(nn_rows).set_index('time')

nn_df:

time car euclidean_distance nearest_neighbour           
0    1   1.414214           3
0    2   1.000000           3
0    3   1.000000           2
1    1   10.049876          3
1    3   10.049876          1
2    4   53.037722          5
2    5   53.037722          4
python pandas knn nearest-neighbor euclidean-distance
1个回答
0
投票

这里是NearestNeighbors方法的文档。

我认为您的问题可以使用NearestNeighbors参数解决。该参数指定要返回的最近邻居数的n_neighbors

通常的值是2,当我们旨在查找点本身以外的单个最近邻居时。最接近的邻居始终是自身,因为距离为0。

要找到第二和第三个最近的邻居,应将indices and distances设置为4。这将返回该点本身,然后是下一个N-1最近的邻居

n_neighbors
© www.soinside.com 2019 - 2024. All rights reserved.