我有一个与值关联的点要素的几何数据集。在约 16000 个值中,大约 100-200 个具有 NaN。我想用 5 个最近邻居的平均值来填充这些值,假设其中至少 1 个不与 NaN 相关。 数据集看起来像:
FID PPM_P geometry
0 0 NaN POINT (-89.79635 35.75644)
1 1 NaN POINT (-89.79632 35.75644)
2 2 NaN POINT (-89.79629 35.75644)
3 3 NaN POINT (-89.79625 35.75644)
4 4 NaN POINT (-89.79622 35.75644)
5 5 NaN POINT (-89.79619 35.75644)
6 6 NaN POINT (-89.79616 35.75644)
7 7 NaN POINT (-89.79612 35.75645)
8 8 NaN POINT (-89.79639 35.75641)
9 9 40.823028 POINT (-89.79635 35.75641)
10 10 40.040865 POINT (-89.79632 35.75641)
11 11 36.214436 POINT (-89.79629 35.75641)
12 12 34.919571 POINT (-89.79625 35.75642)
13 13 NaN POINT (-89.79622 35.75642)
14 14 NaN POINT (-89.79619 35.75642)
15 15 NaN POINT (-89.79615 35.75642)
16 16 NaN POINT (-89.79612 35.75642)
17 17 NaN POINT (-89.79609 35.75642)
18 18 NaN POINT (-89.79606 35.75642)
19 19 NaN POINT (-89.79642 35.75638)
碰巧许多 NaN 都位于数据集的开头附近。
我使用以下方法找到了最近邻权重矩阵:
w_knn = KNN.from_dataframe(predictions_gdf, k=5)
接下来我写道:
# row-normalise weights
w_knn.transform = "r"
# create lag
predictions_gdf["averaged_PPM_P"] = libpysal.weights.lag_spatial(w_knn, predictions_gdf["PPM_P"])
但我在 Averaged_PPM_P 列中得到 NaN。 现在我不知道该怎么办。有人可以帮我吗?
这是使用来自 scipy
的
cKDTree.query
的一种可能选项:
from scipy.spatial import cKDTree
def knearest(gdf, **kwargs):
notna = gdf["PPM_P"].notnull()
arr_geom1 = np.c_[
gdf.loc[notna, "geometry"].x,
gdf.loc[notna, "geometry"].y,
]
arr_geom2 = np.c_[
gdf.loc[~notna, "geometry"].x,
gdf.loc[~notna, "geometry"].y,
]
dist, idx = cKDTree(arr_geom1).query(arr_geom2, **kwargs)
k = kwargs.get("k")
_ser = pd.Series(
gdf.loc[notna, "PPM_P"].to_numpy()[idx].tolist(),
index=(~notna)[lambda s: s].index,
)
gdf.loc[~notna, "PPM_P (INTER)"] = _ser[~notna].map(np.mean)
return gdf
N = 2 # feel free to make it 5, or whatever..
out = knearest(gdf.to_crs(3662), k=range(1, N + 1))
输出(与
N=2
):
NB:每个红点(具有空 PPM_P 的 FID,与 N 个绿点相关联)。
GeoDataFrame(带有中间体):
FID PPM_P (OP) PPM_P (INTER) PPM_P geometry
0 0 34.919571 NaN 34.919571 POINT (842390.581 539861.877)
1 1 NaN 37.480218 37.480218 POINT (842399.476 539861.532)
2 2 NaN 35.567003 35.567003 POINT (842408.370 539861.187)
3 3 NaN 35.567003 35.567003 POINT (842420.229 539860.726)
4 4 36.214436 NaN 36.214436 POINT (842429.124 539860.381)
5 5 NaN 38.127651 38.127651 POINT (842438.018 539860.036)
6 6 NaN 40.431946 40.431946 POINT (842446.913 539859.691)
7 7 40.823028 NaN 40.823028 POINT (842458.913 539862.868)
8 8 NaN 37.871299 37.871299 POINT (842378.298 539851.425)
9 9 40.823028 NaN 40.823028 POINT (842390.158 539850.965)
10 10 40.040865 NaN 40.040865 POINT (842399.052 539850.620)
11 11 36.214436 NaN 36.214436 POINT (842407.947 539850.275)
12 12 34.919571 NaN 34.919571 POINT (842419.947 539853.452)
13 13 NaN 38.127651 38.127651 POINT (842428.841 539853.107)
14 14 40.040865 NaN 40.040865 POINT (842437.736 539852.761)
15 15 NaN 40.431946 40.431946 POINT (842449.595 539852.301)
16 16 NaN 40.431946 40.431946 POINT (842458.489 539851.956)
17 17 NaN 40.431946 40.431946 POINT (842467.384 539851.611)
18 18 NaN 40.431946 40.431946 POINT (842476.278 539851.266)
19 19 NaN 37.871299 37.871299 POINT (842368.981 539840.859)