有没有一种方法可以识别与区域重叠而不仅仅是边界的地理?

问题描述 投票:2回答:3

我正在尝试获取邮政编码的数据集,并将其限制为仅在芝加哥范围内的邮政编码。但是,我尝试执行此合并的任何方式都会捕获太多或太少的邮政编码。这是一个可重现的示例:

## Load packages
library(tigris)
library(sf)
library(ggplot2)

## Load shapefiles
ZIPs <- tigris::zctas(cb = TRUE) 
ZIPs <- sf::st_as_sf(ZIPs)

places <- tigris::places(state = "17", cb = T)
chicago <- places[places$NAME == "Chicago",]
chicago <- sf::st_as_sf(chicago)

## Filter ZIPs to those within Chicago using st_intersects
overlap <- st_filter(ZIPs, chicago, .predicate = st_intersects) #Using st_intersects captures too many ZIPs

## Visualize ZIPs vs Chicago
ggplot() +
  geom_sf(data = overlap, color = "black", size = 1) +
  geom_sf(data = chicago, color = NA, fill = "blue", alpha = .25)

first attempt - captures too many ZIP codes

## Try again using st_within
overlap <- st_filter(ZIPs, chicago, .predicate = st_within) #Using st_within captures too few ZIPs

## Visualize ZIPs vs Chicago
ggplot() +
  geom_sf(data = overlap, color = "black", size = 1) +
  geom_sf(data = chicago, color = NA, fill = "blue", alpha = .25)

second attempt - captures too few

我也曾尝试将sp::over用于此任务,但遇到了同样的问题。显然有一些ZIP大多在芝加哥以外,但可以合理地重叠(例如,第一张地图左上角的三个ZIP)。但是,还有一些仅沿边界相交(例如右上角),甚至似乎根本不相交(右下角)。我想从这张地图exclude任何仅由边界相交的ZIP。有什么建议吗?

r sf sp
3个回答
1
投票

[这里,我提出了一个函数,该函数可以根据相交区域与原始区域的比率与阈值之比来过滤ZIPs。以下是如何使用此功能的示例。似乎threshold = 0.3效果很好,但是您可以根据需要设置任何阈值。

## Load packages
library(tigris)
library(sf)
library(ggplot2)
library(dplyr)

# A function that can filter ZIPs based on the ratio of intersected area to original area
# The default of the threshold is set to be 0.3
# If the ratio is larger than or equal to 0.3, that ZIPs would be kept
intersection_area <- function(x, y, threshold = 0.3){
  z <- st_intersection(x, y)
  z2 <- z %>% 
    mutate(Area_Inter = as.numeric(st_area(.))) %>%
    select(ZCTA5CE10, Area_Inter) %>%
    st_set_geometry(NULL)
  x2 <- x %>%
    st_filter(y, .predicate = st_intersects)  %>%
    mutate(Area = as.numeric(st_area(.))) %>%
    select(ZCTA5CE10, Area) %>%
    left_join(z2, by = "ZCTA5CE10") %>%
    mutate(Area_Ratio = Area_Inter/Area) %>%
    filter(Area_Ratio >= threshold)
  return(x2)
}

overlap <- intersection_area(ZIPs, chicago)

## Visualize ZIPs vs Chicago
ggplot() +
  geom_sf(data = overlap, color = "black", size = 1) +
  geom_sf(data = chicago, color = NA, fill = "blue", alpha = .25)

enter image description here


1
投票

我希望知识渊博的人可以为您提供更好的答案,以更好地了解正在发生的事情。现在,我可以通过排除st_touches返回TRUE的ZCTA进行一些改进。看来我们仍然得到了一些不受欢迎的ZCTAs。您还可以评估每个ZCTA与芝加哥的相交区域,以查看该区域是什么(以了解为什么会返还这些区域)-在某些情况下,我们谈论的是很多重叠。

overlap <- st_filter(ZIPs, chicago, .predicate = st_intersects)
overlap_extra <- st_filter(overlap, chicago, .predicate = st_touches)
nrow(overlap_extra) # Will remove 8 ZCTAs that are touching only
overlap_removed <- 
  overlap[-which(overlap$ZCTA5CE10 %in% overlap_extra$ZCTA5CE10), ]

ggplot() +
  geom_sf(data = overlap, color = "black", size = 1) +
  geom_sf(data = overlap_removed, color = "red", fill = "red", alpha = 0.2,
          size = 1) +
  geom_sf(data = chicago, color = NA, fill = "blue", alpha = .25)


area_intersections <- rep(NA, nrow(overlap_removed))
# Probably not the most efficient way of doing this -- 
for (i in seq(nrow(overlap_removed))) {
  area_intersections[i] <- 
    st_area(
      st_intersection(
        st_geometry(overlap_removed[i, ]), st_geometry(chicago)))
}
summary(area_intersections)
length(which(area_intersections < 1)) # 1 has less than 1m^2 overlap
length(which(area_intersections < 1000)) # 3 have less than 1km^2 overlap

# Small improvement -- if you really want to remove more ZCTAs
overlap_removed2 <- overlap_removed[-which(area_intersections < 1000), ]

ggplot() +
  geom_sf(data = overlap_removed, color = "black", size = 1) +
  geom_sf(data = overlap_removed2, color = "red", fill = "red", alpha = 0.2,
          size = 1) +
  geom_sf(data = chicago, color = NA, fill = "blue", alpha = .25)

0
投票

这是我想出的另一种选择,使用st_filter中的自定义谓词功能>

st_overlaps_with_threshold = function(x, y, threshold) {
  int = st_intersects(x, y)
  lapply(seq_along(int), function(ix)
    if (length(int[[ix]]))
        int[[ix]][which(as.numeric(suppressMessages(st_area(st_intersection(x[ix,], y[int[[ix]],])) / st_area(x[ix,]))) > threshold)]
    else
      integer(0)
  )
}

overlap <- st_filter(ZIPs, chicago, .predicate = st_overlaps_with_threshold, threshold = .05)
© www.soinside.com 2019 - 2024. All rights reserved.