使用 sf 和 st_distance() 匹配 r 中最近的纬度/经度点时结果不一致

问题描述 投票:0回答:0

我有一个大型数据集,其中每一行都是一个站点。我需要在每年内找到最近的车站,但那里使用了不同类型的设备。然后我想将这些行组合成一个新的数据集,其中我有在同一行中彼此相邻复制的每对站点的纬度/经度和其他站点信息,或者有某种索引所以我知道哪些行是有关的。我已经设法按照 this 答案并绘制了它,但似乎 some 站已链接到显然不是最近的站。我不明白这是由于我绘制数据的方式还是我加入最近站点的方式所致 - 我将不胜感激!我也会对更有效的方式感兴趣!

提前感谢您的帮助!!

示例代码:

library(ggplot2)
library(plotly)
library(sf)

#data
set.seed(123)
latitude <- runif(100, 72, 81)
longitude <- runif(100, 20, 60)
gear <- factor(sample(1:2, 100, replace = TRUE))
year <- factor(sample(c(2020, 2021), 100, replace = TRUE))
orig.data <- data.frame(latitude, longitude, gear, year)


orig.data$lat<-orig.data$latitude # duplicating lat/long columns 
orig.data$lon<-orig.data$longitude
df = st_as_sf(orig.data, coords=5:6) # making last 2 columns sf coordinates
# creating distance matrix
dm = st_distance(df)
ijd = data.frame(expand.grid(i=1:nrow(dm), j=1:nrow(dm)))
ijd$distance = c(dm)

# these following lines are a clunky way of copying the important info for each station pair
ijd$year.i = df$year[ijd$i] 
ijd$year.j = df$year[ijd$j]
ijd$gear.i = df$gear[ijd$i]
ijd$gear.j = df$gear[ijd$j]
ijd$latitude.j = df$latitude[ijd$j]
ijd$longitude.j = df$longitude[ijd$j]
ijd$latitude.i = df$latitude[ijd$i]
ijd$longitude.i = df$longitude[ijd$i]

# Filter out different gears and keep matching years. 
# This ensures a point can't be a nearest neighbour of itself.
ijd = ijd[ijd$year.i == ijd$year.j,]
ijd = ijd[ijd$gear.i != ijd$gear.j,]

# selecting the closest stations
# Split into data frames for each i point.
ijd.split = split(ijd, ijd$i)

nearest = function(d){
  d = d[order(d$distance),]
  d[1:min(c(nrow(d),1)),]
}

dn = lapply(ijd.split,nearest)
nnij = do.call(rbind, dn)

# removing duplicated equipment types
nnij2<-subset(nnij, as.factor(gear.i)==1)

# plotting closest stations using 'geom_segment'
# plot clearly shows some stations are joined to ones further away than the logical 'closest' station
ggplot(data = nnij2, aes(x = longitude.i, y = latitude.i, shape = gear.i))+geom_point()+geom_point(data = nnij2, aes(x = longitude.j, y = latitude.j, shape = gear.j))+
  geom_segment(data = nnij2, aes(x = longitude.i, y = latitude.i, xend = longitude.j, yend = latitude.j, colour = distance))+
  facet_wrap(~year.i)

# issue persists when projecting coordinates
ggplotly(basemap(limits=c(25,40,72,79))+
           geom_spatial_point(data = nnij2, aes(x = longitude.i, y = latitude.i, shape = gear.i)) +
           geom_spatial_point(data = nnij2, aes(x = longitude.j, y = latitude.j, shape = gear.j))+
           geom_spatial_segment(data = nnij2, aes(x = longitude.i, y = latitude.i, xend = longitude.j, yend = latitude.j, colour = distance))+
           facet_wrap(~year.i))

图像中的红色箭头突出显示了一个有问题的连接点 - 最高点本应连接到右侧的连接点,但却链接到下面的连接点。

r ggplot2 distance sf closest
© www.soinside.com 2019 - 2024. All rights reserved.