我一直在使用纬度和经度数据。 (Lat&Long)
背景
Actual Df =
Index Latitude Longitude
0 66.36031097267725 23.714807357485936
1 66.36030099322495 23.71479548193769
2
.
.
.
.
12053 66.27918383581169 23.568631229948359
Fleet Df =
Index Latitude Longitude
0 66.34622070356742 23.687960586306179
1 66.34620931053996 23.687951092116624
2
.
.
.
.
8000 66.28435494603767 23.582387305786561
len(Actual) = 12053 # length of Actual Data
len(Fleet) = 8000 # Length of Fleet Data
以上数据表明,Fleet数据Lat / Long点在实际数据Lat&Long Graph中保持较短的区域。
注意:
Fleet Lat和Long值可能不一定等于实际Lat和long值,但它在实际Lat / Long图形点中保持较短的区域
需求
我想根据Fleet Lat / Long Data中的值修剪实际纬度/经度数据的部分。
我的要求是当我在Open Street地图或matplotlib中绘制时,实际Lat / Long数据和Fleet Lat / Long数据必须遵循相同的路径。(位置可能不一定相同)
我尝试了什么:
我使用算术运算
actual_data[(actual_data['Latitude'] <= fleet_data_Lat_start_point) & (actual_data['Longitude'] <= fleet_data_Long_start_point) & (actual_data['Latitude'] <= fleet_data_Lat_end_point) & (actual_data['Longitude'] <= fleet_data_Long_end_point)]
但我无法将实际的Lat / Long数据与车队Lat / long数据相匹配。
在这里我的解决方案:我使用库地理来计算距离。
您可以选择在geodesic()或great_circle()中计算距离,函数distance = geodesic。
如果您更喜欢其他指标,您可以将公制.km
更改为.miles
或更改为m
或更改为ft
from geopy.distance import lonlat, distance, great_circle,geodesic
dmin=[]
for index, r in df_actual.iterrows():
valmin = df_fleet.apply(lambda x:
distance(lonlat(x['Longitude'], x['Latitude']),
lonlat(r['Longitude'], r['Latitude'])).km,axis=1).min()
dmin.append(valmin)
df_actual['nearest to fleet(km)'] = dmin
print(df_actual)
如果你想让所有车队点数<100m的每个实际点数,你就可以了
for ai, a in df_actual.iterrows():
actual = lonlat(a['Longitude'], a['Latitude'])
filter = df_fleet.apply(lambda x:
distance(lonlat(x['Longitude'], x['Latitude']), actual).meters < 100 ,axis=1)
print(f"for {(a['Longitude'], a['Latitude'])}"); print(df_fleet[filter])
最后一个解决方案是基于Tree calcul,我认为它非常非常快我使用scipy空间计算空间中的最近点并给出欧几里德距离的结果。我刚刚在x,y,z空间点中调整了lat,lon,以获得正确的结果(在测地线或半正弦线上)。在这里我生成2个(lat,lon)15000和10000行的数据帧,我正在搜索df2中每个df1最近的五个数据帧
from random import uniform
from math import radians, sin, cos
from scipy.spatial import cKDTree
import pandas as pd
import numpy as np
def to_cartesian(lat, lon):
lat = radians(lat); lon = radians(lon)
R = 6371
x = R * cos(lat) * cos(lon)
y = R * cos(lat) * sin(lon)
z = R * sin(lat)
return x, y , z
def newpoint():
return uniform(23, 24), uniform(66, 67)
def ckdnearest(gdA, gdB, bcol):
nA = np.array(list(zip(gdA.x, gdA.y, gdA.z)) )
nB = np.array(list(zip(gdB.x, gdB.y, gdB.z)) )
btree = cKDTree(nB)
dist, idx = btree.query(nA,k=5) #search the first 5 (k=5) nearest point df2 for each point of df1
dist = [d for d in dist]
idx = [s for s in idx]
df = pd.DataFrame.from_dict({'distance': dist,
'index of df2' : idx})
return df
#create the first df (actual)
n = 15000
lon,lat = [],[]
for x,y in (newpoint() for x in range(n)):
lon += [x];lat +=[y]
df1 = pd.DataFrame({'lat': lat, 'lon': lon})
df1['x'], df1['y'], df1['z'] = zip(*map(to_cartesian, df1.lat, df1.lon))
#-----------------------
#create the second df (fleet)
n = 10000
lon,lat = [],[]
for x,y in (newpoint() for x in range(n)):
lon += [x];lat +=[y]
id = [x for x in range(n)]
df2 = pd.DataFrame({'lat': lat, 'lon': lon})
df2['x'], df2['y'], df2['z'] = zip(*map(to_cartesian, df2.lat, df2.lon))
#-----------------------
df = ckdnearest(df1, df2, 'unused')
print(df)
如果你只想要一个没有笛卡尔坐标的最近点:
def ckdnearest(gdA, gdB, bcol):
nA = np.array(list(zip(gdA.lat, gdA.lon)))
nB = np.array(list(zip(gdB.lat, gdB.lon)))
btree = cKDTree(nB)
dist, idx = btree.query(nA,k=1) #search the first nearest point df2
df = pd.DataFrame.from_dict({'distance': dist, 'index of df2' : idx})
return df