我有一个像这样的数据框
索引 | 地点 ID | var_lat_fact | var_lon_fact |
---|---|---|---|
0 | 167312091448 | 5.6679820000 | -0.0144950000 |
1 | 167312091448 | 5.6686320000 | -0.0157910000 |
2 | 167312091448 | 5.6653530000 | -0.0181980000 |
3 | 167312091448 | 5.6700970000 | -0.0191400000 |
4 | 167312091448 | 5.6689810000 | -0.0104040000 |
对于每个坐标对(纬度,经度),我想计算到数据帧内最近邻居的欧几里德距离。因此,每个点都会在附加列中获得一个度量(例如,nearest_neighbour_dist),指示该距离(以米为单位)。
类似这样的事情
索引 | 地点 ID | var_lat_fact | var_lon_fact | 最近邻居距离 |
---|---|---|---|---|
0 | 167312091448 | 5.6679820000 | -0.0144950000 | 123 |
1 | 167312091448 | 5.6686320000 | -0.0157910000 | 342 |
2 | 167312091448 | 5.6653530000 | -0.0181980000 | 312 |
3 | 167312091448 | 5.6700970000 | -0.0191400000 | 42 |
4 | 167312091448 | 5.6689810000 | -0.0104040000 | 23 |
我实在无法理解这个问题…… 任何帮助将不胜感激。
import pandas as pd
from io import StringIO
from scipy.spatial import KDTree
# Load test data
s = """
place_id,var_lat_fact,var_lon_fact
167312091448 5.6679820000 -0.0144950000
167312091448 5.6686320000 -0.0157910000
167312091448 5.6653530000 -0.0181980000
167312091448 5.6700970000 -0.0191400000
167312091448 5.6689810000 -0.0104040000
""".replace(' ', ',')
df = pd.read_csv(StringIO(s))
# Create kd Tree
points = df[['var_lat_fact', 'var_lon_fact']].values
kd = KDTree(points)
# Compute the closest two neighbors for each point
distances, indexes = kd.query(points, k=2)
# Discard the first 'neighbor' (the point itself, i.e. distance=0),
# And select the second.
df['nearest_neighbour_dist'] = distances[:, 1]
print(df)
place_id var_lat_fact var_lon_fact nearest_neighbour_dist
0 167312091448 5.667982 -0.014495 0.001450
1 167312091448 5.668632 -0.015791 0.001450
2 167312091448 5.665353 -0.018198 0.004068
3 167312091448 5.670097 -0.019140 0.003655
4 167312091448 5.668981 -0.010404 0.004211