如何确保 kNN 的 scikit 考虑到我的自定义距离度量？

Question

我已经按照 SO（here）的其他一些答案将自定义距离应用于我的 kNN 模型，但我认为它没有被考虑在内。

这是我的指标：

def distance_fun(df, text_feat, num_feat):
    # one column levenshtein
    # num_num col eu clidian
    # num_cat cols sine
    num_indices = list(range(len(text_feat), len(text_feat) + len(num_feat)))
    cat_indices = list(range(len(text_feat) + len(num_feat), len(df.columns)))
    def the_func(x, y):
        text_dist = np.sum([lev.distance(x[i],y[i]) for i in np.arange(start=0, stop=len(text_feat))]) / len(text_feat)
        num_dist = euclidean(x[num_indices],y[num_indices])
        cat_dist = dice(x[cat_indices],y[cat_indices])
        return text_dist + num_dist + cat_dist
    return the_func

这是我对

NearestNeighbors

模型的调用：

knn = NearestNeighbors(n_neighbors=10,
                       algorithm='auto',
                       metric=metric,
                       ).fit(tranches_transformed)

其中

tranches_transformed

包含第一列中的文本和其他任何地方的浮点值（数值特征和 OHE 特征的组合）

当我运行它时，我仍然得到以下错误：

ValueError: could not convert string to float: 'my first text value'

即使在提供自定义距离时，所有向量都应该是浮点数？

如何确保 kNN 的 scikit 考虑到我的自定义距离度量？

问题描述投票：0回答：0

最新问题

如何确保 kNN 的 scikit 考虑到我的自定义距离度量？

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0