当某些值为NaN时如何找到k个最近点的平均值?

问题描述 投票:0回答:1

我有一个与值关联的点要素的几何数据集。在约 16000 个值中,大约 100-200 个具有 NaN。我想用 5 个最近邻居的平均值来填充这些值,假设其中至少 1 个不与 NaN 相关。 数据集看起来像:

    FID PPM_P   geometry
0   0   NaN POINT (-89.79635 35.75644)
1   1   NaN POINT (-89.79632 35.75644)
2   2   NaN POINT (-89.79629 35.75644)
3   3   NaN POINT (-89.79625 35.75644)
4   4   NaN POINT (-89.79622 35.75644)
5   5   NaN POINT (-89.79619 35.75644)
6   6   NaN POINT (-89.79616 35.75644)
7   7   NaN POINT (-89.79612 35.75645)
8   8   NaN POINT (-89.79639 35.75641)
9   9   40.823028   POINT (-89.79635 35.75641)
10  10  40.040865   POINT (-89.79632 35.75641)
11  11  36.214436   POINT (-89.79629 35.75641)
12  12  34.919571   POINT (-89.79625 35.75642)
13  13  NaN POINT (-89.79622 35.75642)
14  14  NaN POINT (-89.79619 35.75642)
15  15  NaN POINT (-89.79615 35.75642)
16  16  NaN POINT (-89.79612 35.75642)
17  17  NaN POINT (-89.79609 35.75642)
18  18  NaN POINT (-89.79606 35.75642)
19  19  NaN POINT (-89.79642 35.75638)

碰巧许多 NaN 都位于数据集的开头附近。

我使用以下方法找到了最近邻权重矩阵:

w_knn = KNN.from_dataframe(predictions_gdf, k=5)

接下来我写道:

# row-normalise weights
w_knn.transform = "r"

# create lag
predictions_gdf["averaged_PPM_P"] = libpysal.weights.lag_spatial(w_knn, predictions_gdf["PPM_P"])

但我在 Averaged_PPM_P 列中得到 NaN。 现在我不知道该怎么办。有人可以帮我吗?

geopandas
1个回答
0
投票

这是使用来自 scipy

cKDTree.query 的一种可能选项:

from scipy.spatial import cKDTree

def knearest(gdf, **kwargs):

    notna = gdf["PPM_P"].notnull()

    arr_geom1 = np.c_[
        gdf.loc[notna, "geometry"].x,
        gdf.loc[notna, "geometry"].y,
    ]
    arr_geom2 = np.c_[
        gdf.loc[~notna, "geometry"].x,
        gdf.loc[~notna, "geometry"].y,
    ]

    dist, idx = cKDTree(arr_geom1).query(arr_geom2, **kwargs)

    k = kwargs.get("k")
    _ser = pd.Series(
        gdf.loc[notna, "PPM_P"].to_numpy()[idx].tolist(),
        index=(~notna)[lambda s: s].index,
    )

    gdf.loc[~notna, "PPM_P (INTER)"] = _ser[~notna].map(np.mean)

    return gdf

N = 2 # feel free to make it 5, or whatever..

out = knearest(gdf.to_crs(3662), k=range(1, N + 1))

输出(

N=2
): NB:每个红点(具有空 PPM_P 的 FID,与 N 个绿点相关联)

GeoDataFrame(带有中间体):

    FID  PPM_P (OP)  PPM_P (INTER)      PPM_P                       geometry
0     0   34.919571            NaN  34.919571  POINT (842390.581 539861.877)
1     1         NaN      37.480218  37.480218  POINT (842399.476 539861.532)
2     2         NaN      35.567003  35.567003  POINT (842408.370 539861.187)
3     3         NaN      35.567003  35.567003  POINT (842420.229 539860.726)
4     4   36.214436            NaN  36.214436  POINT (842429.124 539860.381)
5     5         NaN      38.127651  38.127651  POINT (842438.018 539860.036)
6     6         NaN      40.431946  40.431946  POINT (842446.913 539859.691)
7     7   40.823028            NaN  40.823028  POINT (842458.913 539862.868)
8     8         NaN      37.871299  37.871299  POINT (842378.298 539851.425)
9     9   40.823028            NaN  40.823028  POINT (842390.158 539850.965)
10   10   40.040865            NaN  40.040865  POINT (842399.052 539850.620)
11   11   36.214436            NaN  36.214436  POINT (842407.947 539850.275)
12   12   34.919571            NaN  34.919571  POINT (842419.947 539853.452)
13   13         NaN      38.127651  38.127651  POINT (842428.841 539853.107)
14   14   40.040865            NaN  40.040865  POINT (842437.736 539852.761)
15   15         NaN      40.431946  40.431946  POINT (842449.595 539852.301)
16   16         NaN      40.431946  40.431946  POINT (842458.489 539851.956)
17   17         NaN      40.431946  40.431946  POINT (842467.384 539851.611)
18   18         NaN      40.431946  40.431946  POINT (842476.278 539851.266)
19   19         NaN      37.871299  37.871299  POINT (842368.981 539840.859)
© www.soinside.com 2019 - 2024. All rights reserved.