我有两个不同的数据集(下面的示例):
我试图找到数据集 1 和数据集 2 中每个站点之间的最小距离。因此,数据集 1 中的每个位置都会有一列显示与数据集 2 中存在的最近站点的距离。
到目前为止我有这个,但我无法让它工作。任何建议如何进行表示赞赏。
from geopy import distance
import pandas as pd
s = {
'site_id': dataset1['site_id'],
'latitude' : dataset1['latitude'],
'longitude' : dataset1['longitude']
}
d = {
'site_id': dataset2['site_id'],
'latitude' : dataset2['latitude'],
'longitude' : dataset2['longitude']
}
#s = pd.DataFrame(s)
#d = pd.DataFrame(d)
for (ss, a) in s.items():
best = None
dist = None
for (dd, b) in d.items():
km = distance.distance(a, b).km
if dist is None or km < dist:
best = dd
dist = km
print(f'{ss} is nearest {best}: {dist} km')
您可以使用 Haversine 公式计算给定经纬度坐标的两点之间的距离。下面是一个示例,说明如何修改代码以使用 Haversine 公式:
from math import radians, sin, cos, sqrt, atan2
def haversine(lat1, lon1, lat2, lon2):
# convert decimal degrees to radians
lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
# haversine formula
dlat = lat2 - lat1
dlon = lon2 - lon1
a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
km = 6371 * c
return km
使用以下内容更新您的代码
for (ss, a) in s.items():
best = None
dist = None
for (dd, b) in d.items():
km = haversine(a['latitude'], a['longitude'], b['latitude'], b['longitude'])
if dist is None or km < dist:
best = dd
dist = km
print(f'{ss} is nearest {best}: {dist} km')
看起来数据框中还有另一列称为
'site_id'
,因为您正在将其读入s
和d
变量
s = {
'site_id': dataset1['site_id'],
'latitude' : dataset1['latitude'],
'longitude' : dataset1['longitude']
}
所以看起来你会在公式中比较
site_id
km = distance.distance(a, b).km
此外,
a
和 b
需要是纬度/经度的元组,这似乎不太可能是从 s
和 d
系列中提取的情况。
你的数据框更像这样吗?
DF1 DF2
| site_id | latitude | longitude | | site_id | latitude | longitude |
|---------|----------|-----------| |---------|----------|-----------|
| Site1 |51.8236 | -3.019610 | | SiteA | 51.8313 | -2.27422 |
| Site2 |52.4157 | -4.083580 | | SiteB | 50.4891 | -3.55259 |
| Site3 |57.1478 | -2.098000 | | SiteC | 56.5792 | -3.34735 |
| Site4 |56.4617 | -2.991410 | | SiteD | 57.1492 | -2.08277 |
| Site5 |51.2490 | -0.764848 | | SiteE | 57.2875 | -2.37346 |
| Site6 |57.1438 | -2.109280 | | SiteF | 57.1558 | -2.11278 |
| Site7 |51.6707 | -1.282660 | | SiteG | 57.1967 | -2.09314 |
| SiteH | 57.1538 | -2.27820 |
| SiteI | 53.7527 | -2.36054 |
| SiteJ | 55.8659 | -3.97845 |
如果是这样,您想创建两个元组字典,其中
site_id
是键,纬度/经度的元组是值,如下面的 s_dict
和 d_dict
所示,例如;
s_dict = {
'Site1': (51.8236, -3.01961),
'Site2': (52.4157, -4.08358),
'Site3': (57.1478, -2.098),
'Site4': (56.4617, -2.99141),
...
}
然后您可以提取每个站点的源纬度/经度元组并与目标元组进行比较并获得最佳距离。
from geopy import distance
import pandas as pd
# Dataframes...dataset1 and dataset2 sourced
s_dict = {x[0]: (x[1:]) for x in dataset1.itertuples(index=False)}
d_dict = {x[0]: (x[1:]) for x in dataset2.itertuples(index=False)}
for s_site, s in s_dict.items():
print(f'Checking site: {s_site} Co-Ords: {s}')
best = None
dist = None
for d_site, d in d_dict.items():
km = distance.distance(s, d).km
print(f'Comparing {s_site} to {d_site}, Co-ords: {d}, Distance: {km}')
if dist is None or km < dist:
best = d_site
dist = km
print(f'{s_site}: The nearest site is {best}: {dist} km')
这应该给出如下输出,并为每个比较添加打印行;
Checking site: Site1 Co-Ords: (51.8236, -3.01961)
Comparing Site1 to SiteA, Co-ords: (51.8313, -2.27422), Distance: 51.39541157179988
Comparing Site1 to SiteB, Co-ords: (50.4891, -3.55259), Distance: 153.07461731514346
Comparing Site1 to SiteC, Co-ords: (56.5792, -3.34735), Distance: 529.7691869147437
Comparing Site1 to SiteD, Co-ords: (57.1492, -2.08277), Distance: 595.8983925872216
Comparing Site1 to SiteE, Co-ords: (57.2875, -2.37346), Distance: 609.6418219993236
Comparing Site1 to SiteF, Co-ords: (57.1558, -2.11278), Distance: 596.4352710313524
Comparing Site1 to SiteG, Co-ords: (57.1967, -2.09314), Distance: 601.0900671080333
Comparing Site1 to SiteH, Co-ords: (57.1538, -2.2782), Distance: 595.2575358222653
Comparing Site1 to SiteI, Co-ords: (53.7527, -2.36054), Distance: 219.22839170745868
Comparing Site1 to SiteJ, Co-ords: (55.8659, -3.97845), Distance: 454.30856091686644
Site1: The nearest site is SiteA: 51.39541157179988 km