如何使用sklearn KNearest邻居获得1：1的对应项

Question

我正在编写一种算法，根据兴趣相似度，使用NearestNeighbors（n_neighbors = 1）将setA中的每个人与setB中的一个人进行匹配。

这是我到目前为止所拥有的：

dfA = pd.DataFrame(np.array([[1, 1, 1, 1], [1,1,2,2], [4, 5, 2, 0], [8, 8, 8, 8]]),
                   columns=['interest0', 'interest2', 'interest3','interest4'],
                  index=['personA0','personA1','personA2','personA3'])


dfB = pd.DataFrame(np.array([[1, 1, 1, 1], [1, 1, 1, 2], [2,3,2,2], [8, 6, 8, 8]]),
                   columns=['interest0', 'interest2', 'interest3','interest4'],
                  index=['personB0','personB1','personB2','personB3'])


knn = NearestNeighbors(n_neighbors = 1, metric = my_dist).fit(dfA)
distances, indices = knn.kneighbors(dfB)


>>> dfA
          drink  interest2  interest3  interest4
personA0      1          1          1          1
personA1      1          1          2          2
personA2      4          5          2          0
personA3      8          8          8          8



>>> dfB
          drink  interest2  interest3  interest4
personB0      1          1          1          1
personB1      1          1          1          2
personB2      2          3          2          2
personB3      8          6          8          8

>>> print("Distances\n\n", distances, "\n\nIndices\n\n", indices)

Distances

 [[0.   ]
 [0.125]
 [1.125]
 [0.5  ]] 

Indices

 [[0]
 [0]
 [1]
 [3]]

查看输出，它表明personB0的最高匹配项是personA0（距离= 0）。但是，personB1的最高匹配项也是personA0（距离= 0.125）！

我想以某种方式将personB0与personA0匹配（因为它们的距离最小），将它们移动到另一个表，然后重新运行K-Neighbors算法，希望该算法现在建议personB1的最高匹配项是personA1（因为现在删除了A0 ）。我已经开始编写for循环对此进行迭代，但是，这对我来说相当复杂（必须遍历多个不同的数组，数据帧等），所以我想知道什么是最佳方法？我想要一个如下所示的最终数据框，它具有1：1对应关系：

  SetA         SetB
personA0     personB0
personA1     personB1
personA2     personB3
personA3     personB2

Answer 1

您可以使用列表来检查某个人是否匹配。此外，您需要通过更改传递给参数n_neighbors的tha值来获得按其距离而不是最近的邻居排序的邻居列表。

knn = NearestNeighbors(n_neighbors=len(dfB)).fit(dfB)
distances, indices = knn.kneighbors(dfA)

matched = []
pairs = []
for indexA, candidatesB in enumerate(indices):
    personA = dfA.index[indexA]
    for indexB in candidatesB:
        if indexB not in matched:
            matched.append(indexB)
            personB = dfB.index[indexB]
            pairs.append([personA, personB])
            break

matches = pd.DataFrame(pairs, columns=['SetA', 'SetB'])
结果数据框看起来像这样：

SetA SetB 0 personA0 personB0 1 personA1 personB1 2 personA2 personB2 3 personA3 personB3

[请注意，我使用了默认指标（p = 2的minkowski）。如果将metric=my_dist传递给NearestNeighbors，结果可能会有所不同。

如何使用sklearn KNearest邻居获得1：1的对应项

问题描述投票：1回答：1

1个回答

最新问题

如何使用sklearn KNearest邻居获得1：1的对应项

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1