R 中按距离对空间点进行分组:如何对空间点进行分组/聚类,使各组之间相隔设定的距离

问题描述 投票:0回答:2

我有一组 52 个空间点。我试图根据 11 公里的距离将这些点分成几组。即,将所有点分成组,组内点相距不超过 11 公里。然后,我想将每个点分配给新列中数据框中的组。

我找到了几篇文章,但它们没有让我到达我想要的地方,因为它们需要设置集群的数量(这并不重要)。我尝试改编this帖子,但它给了我 52 个不同的组,这似乎不太正确,因为有很多点彼此相距在 11 公里以内,所以寻找新的选择。

structure(list(station = c("BE01", "BE02", "BEUWM01", "BL01", 
"BL02", "PB01", "PB02", "PB03", "PB04", "PB05", "PB06", "PB07", 
"PB09", "PB10", "PB11", "PB12", "PB13", "PB14", "PB15", "PB16", 
"PB17", "PB18", "PB19", "PB20", "PB21", "PB22", "PB23", "PB24", 
"PB25", "PB26", "PB27", "PB28", "PB29", "PB30", "PB4G01", "PB4G02", 
"PBUWM01", "PBUWM02", "SA01", "SA02", "SA02b", "SA03", "SA04", 
"SA05", "SA06", "SA07", "SA11", "SAUWM01", "VB01", "VB02", "VB03", 
"VB04"), longitude = c(71.6546833333333, 71.6748333333333, 71.66293, 
72.4337833333333, 72.4347, 71.7342, 71.7632, 71.7992, 71.8092, 
71.8326916667, 71.8405, 71.8796, 71.96835, 71.9697666666667, 
71.9727, 71.9745666666667, 71.9385075, 71.8685, 71.8524, 71.8414, 
71.8294, 71.758275, 71.7578, 71.7468, 71.9733, 71.9795, 71.9741, 
71.9209, 71.8959, 71.8228, 71.7498, 71.7323, 71.9068, 71.7474, 
71.9398, 71.8329, 71.98115, 71.75197, 72.24793, 72.241, 72.23027, 
72.2569, 72.2812, 72.1980666667, 72.2116, 72.221, 72.2636, 72.24754, 
72.2155, 72.2405, 72.2156, 72.2488), latitude = c(-5.25671666666667, 
-5.2662, -5.24915, -5.2579, -5.2432, -5.2815, -5.2459, -5.2461, 
-5.2448, -5.23439583333, -5.2567, -5.2694, -5.24165, -5.33015, 
-5.3344, -5.37878333333333, -5.39709575, -5.4271, -5.4229, -5.4308, 
-5.4406, -5.456505, -5.3823, -5.3512, -5.2695, -5.3039, -5.3521, 
-5.4126, -5.4243, -5.4644, -5.3957, -5.3181, -5.2668, -5.4261, 
-5.2582, -5.445, -5.33995, -5.38898, -5.31183, -5.3056, -5.316745, 
-5.2985, -5.3347, -5.35026666667, -5.3686, -5.3225, -5.3327, 
-5.30114, -5.5452, -5.5207, -5.5247, -5.546)), row.names = c(NA, 
-52L), class = "data.frame")
r cluster-analysis geospatial
2个回答
1
投票

我们可以按如下方式进行。首先,将 x/y 坐标转换为 sf 点:

library(sf)

points <- st_as_sf(df, coords = c("longitude", "latitude"), crs = 4326)

然后我们可以获得 52 x 52 距离矩阵,给出每对站点之间的距离

adj <- st_distance(points)

此外,我们可以将其转换为二进制矩阵,告诉我们每对站点之间的距离是否在 11 公里以内:

adj <- matrix(as.numeric(as.numeric(adj)) < 11000, nrow = nrow(adj))

注意这是一个邻接矩阵,所以我们可以轻松地将它变成一个图:

library(igraph)

g <- graph_from_adjacency_matrix(adj)

如果绘制此图,我们会看到有 4 个连接的组件,代表彼此相距 11 公里以内的站群:

plot(g)

我们可以获取这些组件的数量并将它们放回到我们的原始数据框中:

df$group <- factor(components(g)$membership)

然后,这将正确标记彼此相距 11 公里以内的站点,正如我们从结果图中看到的那样:

ggplot(rnaturalearth::ne_countries(scale = 10, returnclass = 'sf')) +
  geom_sf() +
  geom_point(data = df, aes(x = longitude, y = latitude, color = group)) +
  coord_sf(xlim = c(71.5, 72.5), ylim = c(-6, -5))

创建于 2023-09-26,使用 reprex v2.0.2


0
投票

计算

st_distance
的时间随着点数的平方而增长,所以会有点慢。在我的机器上 14 秒获得 3k 点。下面的解决方案对于 3k 点来说快了 20 倍,即 0.5 秒(当然这也可能取决于其他因素,但仍然值得注意):

## 3k random points in theoretical UTM zone
df<- data.frame( x=runif(3000,   500000, 640000),
                 y=runif(3000, 5000000, 5140000) )
## convert to points in a hypothetical
## area in 32 UTM zone (epsg=32632)
points <- st_as_sf(df, coords = c("x", "y"), crs = 32632)
## draw areas around 1.5 km
areas <- st_buffer(points, 1500, nQuadSegs = 4) 
dissolved.areas <- st_cast(st_union(areas), "POLYGON")

dissolved.areas<-  sf::st_as_sf(dissolved.areas)
dissolved.areas$id <- 1:nrow(dissolved.areas)
points.with.cluster.id <- st_join(points, dissolved.areas)
df$group <- factor(points.with.cluster.id$id)

ggplot() +
  geom_sf(data=dissolved.areas) +
  geom_point(data = df, aes(x = x, y = y), color = df$group, 
             size=1) +
  theme_bw()
© www.soinside.com 2019 - 2024. All rights reserved.