使用函数及其内容的不同结果

问题描述 投票:2回答:2

我试图了解fast_knn库的功能impyute的工作。因此,我尝试逐行执行它以了解其工作原理。这是:

import numpy as np
from scipy.spatial import KDTree
def shepards(distances, power=2):
    return to_percentage(1/np.power(distances, power))

def to_percentage(vec):
    return vec/np.sum(vec)

data_temp = np.arange(25).reshape((5, 5)).astype(np.float)
data_temp[0][2] =  np.nan
k=4
eps=0
p=2
distance_upper_bound=np.inf
leafsize=10
idw_fn=shepards
init_impute_fn=mean

nan_xy = np.argwhere(np.isnan(data_temp))
data_temp_c = init_impute_fn(data_temp)
kdtree = KDTree(data_temp_c, leafsize=leafsize)
for x_i, y_i in nan_xy:
    distances, indices = kdtree.query(data_temp_c[x_i], k=k+1, eps=eps,
                                      p=p, distance_upper_bound=distance_upper_bound)
    # Will always return itself in the first index. Delete it.
    distances, indices = distances[1:], indices[1:]
    # Add small constant to distances to avoid division by 0
    distances += 1e-3
    weights = idw_fn(distances)
    # Assign missing value the weighted average of `k` nearest neighbours
    data_temp[x_i][y_i] = np.dot(weights, [data_temp_c[ind][y_i] for ind in indices])
data_temp

此输出:

array([[ 0.        ,  1.        , 10.06569379,  3.        ,  4.        ],
       [ 5.        ,  6.        ,  7.        ,  8.        ,  9.        ],
       [10.        , 11.        , 12.        , 13.        , 14.        ],
       [15.        , 16.        , 17.        , 18.        , 19.        ],
       [20.        , 21.        , 22.        , 23.        , 24.        ]])

而该函数具有不同的输出。代码:

from impyute import fast_knn
import numpy as np
data_temp = np.arange(25).reshape((5, 5)).astype(np.float)
data_temp[0][2] =  np.nan
fast_knn(data_temp, k=4)

和输出

array([[ 0.        ,  1.        , 16.78451885,  3.        ,  4.        ],
       [ 5.        ,  6.        ,  7.        ,  8.        ,  9.        ],
       [10.        , 11.        , 12.        , 13.        , 14.        ],
       [15.        , 16.        , 17.        , 18.        , 19.        ],
       [20.        , 21.        , 22.        , 23.        , 24.        ]])
``

python python-3.x function knn imputation
2个回答
0
投票

GitHub repository代码和库源代码似乎存在差异(存储库尚未更新)。以下是库源代码:

def fast_knn(data, k=3, eps=0, p=2, distance_upper_bound=np.inf, leafsize=10, **kwargs):    
    null_xy = find_null(data)
    data_c = mean(data)
    kdtree = KDTree(data_c, leafsize=leafsize)

    for x_i, y_i in null_xy:
        distances, indices = kdtree.query(data_c[x_i], k=k+1, eps=eps,
                                          p=p, distance_upper_bound=distance_upper_bound)
        # Will always return itself in the first index. Delete it.
        distances, indices = distances[1:], indices[1:]
        weights = distances/np.sum(distances)
        # Assign missing value the weighted average of `k` nearest neighbours
        data[x_i][y_i] = np.dot(weights, [data_c[ind][y_i] for ind in indices])
    return data

权重以不同的方式计算(不使用shepards函数)。因此,输出的差异。


0
投票

也许您使用了the code on the current master branchmaster。但是,您使用的impyute软件包版本可能是v0.0.8,impyute

the code is at the release/0.0.8 branch的定义如下。

在当前release/0.0.8分支上:

fast_knn

master分支上:

# Will always return itself in the first index. Delete it.
distances, indices = distances[1:], indices[1:]
# Add small constant to distances to avoid division by 0
distances += 1e-3
weights = idw_fn(distances)

如果使用release/0.0.8分支中的代码,则将获得与使用impyute软件包相同的结果。

© www.soinside.com 2019 - 2024. All rights reserved.