我有一个包含经度、纬度和温度列的数据框,以及从here下载的单独的 geojson 文件,表示适合我的 df 中所有坐标的区域。
如何过滤温度 df 以便只记录适合 geojson 文件的区域?
我不知道从哪里开始,所以还没有尝试过任何事情!
例如,在温度 df 下:
dput(temperature_df)
structure(list(Lat = c(52, 53.23, 51.37), Long = c(-1.89, -2.45,
-2.1), Temp = c(13.67, 10.38, -2.45)), class = "data.frame", row.names = c(NA,
-3L))
我期待返回过滤后的 df:
纬度 | 长 | 温度 |
---|---|---|
52.00 | -1.89 | 13.67 |
53.23 | -2.45 | 10.38 |
序言
为了确保结果的有效性,您必须知道哪个坐标系 (CRS) 构成两个数据集的坐标基础。您声明设置 WGS84/EPSG:4326 的 CRS 可以正确绘制您的点,但除非您知道您的点坐标是 WGS84,否则可能会导致错误的结果。请参阅此答案,以更深入地讨论为什么不能只是假设。
此解决方案基于温度 df 中的坐标为 WGS84/EPSG:4326 的假设。 geojson文件的CRS是OSGB36/英国国家电网。鉴于数据来自英国 ONS,我们可以 100% 确定 CRS 是正确的,因此在此示例中它将用作基础 CRS。
加载所需的包和数据
library(sf)
library(dplyr)
library(ggplot2)
# Read and subset geojson file from your working directory, downloaded from
# https://geoportal.statistics.gov.uk/datasets/7c23fbe8e89d4cf79ff7f2a6058e6200_0/explore
sf_poly <- read_sf("Regions_December_2023_Boundaries_EN_BFC_4211903881402860639.geojson") %>%
filter(RGN23NM == "South West")
sf_poly["RGN23NM"]
# Simple feature collection with 1 feature and 1 field
# Geometry type: MULTIPOLYGON
# Dimension: XY
# Bounding box: xmin: 82668.52 ymin: 5336.966 xmax: 435920 ymax: 246052.2
# Projected CRS: OSGB36 / British National Grid
# # A tibble: 1 × 2
# RGN23NM geometry
# <chr> <MULTIPOLYGON [m]>
# 1 South West (((83962.84 5401.15, 83970.68 5400.71, 83977.49 5404.05, 83985.27 540…
现在创建点数据的 sf:
# Your point data (assumed to be WGS84/EPSG:4326)
points <- read.table(text = "Lat Long Temp
52.00 -1.89 13.67
53.23 -2.45 10.38
51.37 -2.10 -2.45", header = TRUE) %>%
mutate(ID = 1:n()) # Added a unique ID here for illustrative purposes
# Create sf from your df, assign assumed WGS84 CRS, then project to CRS of sf_poly
sf_points <- st_as_sf(points, coords = c("Long", "Lat"), crs = 4326) %>%
st_transform(st_crs(sf_poly))
sf_points
# Simple feature collection with 3 features and 2 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 370057.6 ymin: 163442.6 xmax: 407648.6 ymax: 370423.5
# Projected CRS: OSGB36 / British National Grid
# Temp ID geometry
# 1 13.67 1 POINT (407648.6 233512.2)
# 2 10.38 2 POINT (370057.6 370423.5)
# 3 -2.45 3 POINT (393135.4 163442.6)
请注意,sf_poly 和 sf_points 具有相同的 CRS,但如上所述,您只能确定这是有效的 if 您的原始点坐标是 WGS84。如果您知道它们位于不同的 CRS 中,请修改
crs = 4326
以匹配实际的 CRS。另外,请注意经度和纬度在例如中声明的顺序。 coords = c("Long", "Lat")
。需要首先声明经度/x 轴值。如果您应用其他答案中的代码,您的点将绘制在索马里海岸附近。
使用poly_sf对你的points_sf进行子集化
我提供了两个选项:
st_intersection()
,仅返回poly_sf内的点。您可以安全地忽略该警告:# Return sf_points only if they fall within sf_poly
sf_inter <- st_intersection(sf_points, sf_poly)
# Warning message:
# attribute variables are assumed to be spatially constant throughout all geometries
sf_inter[,c("ID", "RGN23NM")]
# Simple feature collection with 2 features and 2 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 393135.4 ymin: 163442.6 xmax: 407648.6 ymax: 233512.2
# Projected CRS: OSGB36 / British National Grid
# ID RGN23NM geometry
# 1 1 South West POINT (407648.6 233512.2)
# 3 3 South West POINT (393135.4 163442.6)
st_join()
,返回所有点,但每个点都被标识为sf_poly内部或外部:# Assign values to all sf_points
sf_join <- st_join(sf_points, sf_poly) %>%
mutate(RGN23NM = if_else(is.na(RGN23NM), "Outside Area", RGN23NM))
sf_join[,c("ID", "RGN23NM")]
# Simple feature collection with 3 features and 2 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 370057.6 ymin: 163442.6 xmax: 407648.6 ymax: 370423.5
# Projected CRS: OSGB36 / British National Grid
# ID RGN23NM geometry
# 1 1 South West POINT (407648.6 233512.2)
# 2 2 Outside Area POINT (370057.6 370423.5)
# 3 3 South West POINT (393135.4 163442.6)
结果,使用 sf_join 进行说明:
ggplot() +
geom_sf(data = sf_poly) +
geom_sf(data = sf_join, aes(colour = RGN23NM), size = 2) +
scale_colour_discrete(name = "Temperature\nPoints")
sf::st_as_sf()
和sf::st_intersects()
是你的朋友。
由于您不提供数据,我们使用取自
此处的
.geojson
文件。
tmp = tempfile(fileext = ".geojson")
download.file("https://raw.githubusercontent.com/gregoiredavid/france-geojson/master/communes.geojson",
tmp)
library(sf)
gj = read_sf(tmp)
gj = gj[1L, ] # use just the first polygon
# some points randomly created around centre of gj
points = structure(list(lat = c(3.30454190848165, 3.29946780571954, 3.29612845330439, 3.27991321713169, 3.30485103998978), long = c(46.1807449149569, 46.1254499991144, 46.137126928633, 46.1617752630819, 46.1662531731711), temp = c(-6.46641247389143, -0.910962931755058, 32.8972983969805, 24.8380473564282, 11.9387508544478)), row.names = c(NA, -5L), class = "data.frame")
points
转换为 points_sf
根据多边形的crs设置crs。
# convert to sf object
points_sf = st_as_sf(x = points, coords = c("lat", "long"), crs = st_crs(gj))
> points_sf
Simple feature collection with 5 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3.279913 ymin: 46.12545 xmax: 3.304851 ymax: 46.18074
Geodetic CRS: WGS 84
temp geometry
1 -6.4664125 POINT (3.304542 46.18074)
2 -0.9109629 POINT (3.299468 46.12545)
3 32.8972984 POINT (3.296128 46.13713)
4 24.8380474 POINT (3.279913 46.16178)
5 11.9387509 POINT (3.304851 46.16625)
绘制两者:
library(ggplot2)
ggplot() +
geom_sf(data = gj) +
geom_sf(data = points_sf)
sf::st_intersects()
和 subset()
points_sf$inside = as.integer(st_intersects(points_sf, gj))
# filter:
points_sf = subset(points_sf, subset = inside == 1L, select = -inside)
我们得到:
> points_sf
Simple feature collection with 3 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3.279913 ymin: 46.13713 xmax: 3.304851 ymax: 46.16625
Geodetic CRS: WGS 84
temp geometry
3 32.89730 POINT (3.296128 46.13713)
4 24.83805 POINT (3.279913 46.16178)
5 11.93875 POINT (3.304851 46.16625)
用过滤点绘制两者:
ggplot() +
geom_sf(data = gj) +
geom_sf(data = points_sf)
过滤后的潜在再转换
points_sf
:
cbind.data.frame(matrix(unlist(points_sf$geometry), ncol = 2L, byrow = TRUE),
points_sf$temp) |>
`colnames<-`(c("lat", "lon", "temp"))
lat lon temp
1 3.296128 46.13713 32.89730
2 3.279913 46.16178 24.83805
3 3.304851 46.16625 11.93875