我正在开展一个个人项目,试图结合市政府的开放数据源。我之前几乎没有使用地理空间数据的经验,因此使用它来可视化见解已被证明是一个挑战。
本质上,我正在尝试结合两个数据源。
第一个是交通事件的经典 .csv 数据集 - 到目前为止,我已经花时间对其进行预处理和分析,并且对此感到非常满意。值得注意的是,交通事件的位置由 street_name 给出,我已将其标准化并分成 street 列,而不是坐标或其他标识符。其他列提供有关该行事件的更多信息(总共约 900k 行)。
基于此,我寻找了有关街道的最详细的数据集。不幸的是,符合我需求的几乎太详细了——大多数街道被分成更小的部分,我无法将我的事件数据分解到这些部分。为了正确显示数据,我需要将共享相同 street_name 的所有 LINESTRING 几何图形(在 geometry 列中)合并到一条连续线上,以便我可以使用街道名称作为键将整条街道与其上发生的任何事件相匹配。相关地,这些线也有一个 Shape_Length 属性,我不确定(考虑到我不熟悉)它是否必须手动求和,是否会事后重新计算,或者只要几何图形的坐标匹配,甚至首先是必要的。
总而言之,对我来说是一个挑战的文件看起来像这样:
dput(head(tsk_data, 5))
structure(list(OBJECTID = 1:5, MKN = c("NN7453", "NN7452", "NN7451",
"NN7450", "NN7449"), USEK_NAZ = c("Charvátova-pokračování",
"Jablunkovská-pokračování", "Vršovická-Vršovická", "Točitá-pokračování",
"Antala Staška-Panuškova"), DALNICE = c(0L, 0L, 0L, 0L, 0L),
TRIDASIL = c(0L, 0L, 0L, 0L, 0L), TRIDAMK = c(0L, 0L, 1L,
0L, 0L), TYPKOMUNIK = c(0L, 19L, 21L, 19L, 19L), SMEROVOST = c(3L,
0L, 0L, 0L, 1L), POSKYT = c("HMP-TSK", "HMP-TSK", "HMP-TSK",
"HMP-TSK", "HMP-TSK"), Shape_Length = c(0.000726434480686331,
0.000688214170240968, 0.000104557565766303, 0.000500798213310109,
0.000861111526683966), geometry = structure(list(structure(c(14.4208013050001,
14.420701298, 14.4204086710001, 14.420336804, 14.420311828,
14.420252503, 50.0818569370001, 50.081835529, 50.0818334850001,
50.0818486040001, 50.081881234, 50.082089963), dim = c(6L,
2L), class = c("XY", "LINESTRING", "sfg")), structure(c(14.508402357,
14.5084069600001, 14.5089489260001, 50.1391580700001, 50.1390268330001,
50.1388987450001), dim = 3:2, class = c("XY", "LINESTRING",
"sfg")), structure(c(14.449696617, 14.4497051230001, 50.06592993,
50.066034141), dim = c(2L, 2L), class = c("XY", "LINESTRING",
"sfg")), structure(c(14.4354879690001, 14.4349871740001,
50.0397977040001, 50.0397959100001), dim = c(2L, 2L), class = c("XY",
"LINESTRING", "sfg")), structure(c(14.4368686030001, 14.4368403500001,
14.4368092660001, 14.436791758, 14.4367725560001, 14.436084587,
50.040384198, 50.040429573, 50.0405017510001, 50.0405103030001,
50.0405145030001, 50.0404624990001), dim = c(6L, 2L), class = c("XY",
"LINESTRING", "sfg"))), class = c("sfc_LINESTRING", "sfc"
), precision = 0, bbox = structure(c(xmin = 14.420252503,
ymin = 50.0397959100001, xmax = 14.5089489260001, ymax = 50.1391580700001
), class = "bbox"), crs = structure(list(input = "WGS 84",
wkt = "GEOGCRS[\"WGS 84\",\n DATUM[\"World Geodetic System 1984\",\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4326]]"), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(OBJECTID = NA_integer_,
MKN = NA_integer_, USEK_NAZ = NA_integer_, DALNICE = NA_integer_,
TRIDASIL = NA_integer_, TRIDAMK = NA_integer_, TYPKOMUNIK = NA_integer_,
SMEROVOST = NA_integer_, POSKYT = NA_integer_, Shape_Length = NA_integer_
), levels = c("constant", "aggregate", "identity"), class = "factor"), row.names = c(NA,
5L), class = c("sf", "data.frame"))
经过一番搜索,我发现最接近我认为必须完成的功能的是 st_combine 函数。然而,据我所知,默认行为是折叠 x 中的所有几何图形,这将破坏我项目的重点。相反,我只想折叠/合并那些在 street name (MKN) 列中共享相同值的值。我尝试通过子集设置(包括 for 循环)来解决此问题,但输出的列表似乎不遵循原始结构(同样,我不熟悉预期的对象类型和格式),并且纬度和经度坐标似乎四舍五入到不可用的程度。
重申一下,我正在尝试找到一种方法,根据其属性之一中的共享值有条件地合并现有 geojson 中的线几何图形(不是点或多边形)。
我只想折叠/合并街道名称 (MKN) 列中具有相同值的那些。
对于
sf
对象,您可以应用与常规 data.frames 和 tibbles 相同的 split-apply-combine 逻辑,如果您更喜欢基础 R,则可以通过 aggregate()
;或者当倾向于 Tidyverse 时使用 group_by()
/ summarise()
。例如,让我们使用 roxel
中的 sfnetworks
数据集,检查 summarise()
之后许多特征和几何类型如何更改:
library(sf)
library(dplyr)
library(ggplot2)
# include only named sections and certain street types:
streets <- sfnetworks::roxel %>%
filter(`type` %in% c("residential", "secondary", "service") & !is.na(name)) %>%
arrange(name)
# every geometry is a LINESTRING:
streets
#> Simple feature collection with 466 features and 2 fields
#> Geometry type: LINESTRING
#> Dimension: XY
#> Bounding box: xmin: 7.522625 ymin: 51.94151 xmax: 7.546705 ymax: 51.95945
#> Geodetic CRS: WGS 84
#> # A tibble: 466 × 3
#> name type geometry
#> <chr> <fct> <LINESTRING [°]>
#> 1 Aloysia-Delsen-Weg residential (7.533476 51.95815, 7.533743 51.95769)
#> 2 Aloysia-Delsen-Weg residential (7.533743 51.95769, 7.53405 51.95717)
#> 3 Aloysia-Delsen-Weg residential (7.534638 51.95841, 7.535086 51.95852, 7.5351…
#> 4 Aloysia-Delsen-Weg residential (7.531509 51.95858, 7.531778 51.95807)
#> 5 Aloysia-Delsen-Weg residential (7.531778 51.95807, 7.531966 51.95773)
#> 6 Aloysia-Delsen-Weg residential (7.531778 51.95807, 7.532206 51.95805)
#> 7 Aloysia-Delsen-Weg residential (7.534638 51.95841, 7.534876 51.95799, 7.5348…
#> 8 Aloysia-Delsen-Weg residential (7.534875 51.95795, 7.534819 51.95792, 7.5337…
#> 9 Aloysia-Delsen-Weg residential (7.532206 51.95805, 7.532552 51.95804, 7.5329…
#> 10 Aloysia-Delsen-Weg residential (7.533287 51.95812, 7.533476 51.95815)
#> # ℹ 456 more rows
# text on every geometry
ggplot(streets, aes(label = name)) +
geom_sf(aes(color = name), linewidth = 1.5) +
geom_sf_text(size = 2, alpha = .8) +
theme_void() +
theme(legend.position = "none")
#> Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
#> give correct results for longitude/latitude data
# union of streets by name, not all streets have multiple sections,
# this leaves us with mixed geometries (LINESTRING + MULTILINESTRING)
streets_union <- streets %>%
group_by(name) %>%
summarise() %>%
arrange(name)
# from 466 features down to 75:
streets_union
#> Simple feature collection with 75 features and 1 field
#> Geometry type: GEOMETRY
#> Dimension: XY
#> Bounding box: xmin: 7.522625 ymin: 51.94151 xmax: 7.546705 ymax: 51.95945
#> Geodetic CRS: WGS 84
#> # A tibble: 75 × 2
#> name geometry
#> <chr> <GEOMETRY [°]>
#> 1 Aloysia-Delsen-Weg MULTILINESTRING ((7.533476 51.95815, 7.533743 51.9576…
#> 2 Alte Dorfstrasse LINESTRING (7.532418 51.95481, 7.532606 51.95483, 7.5…
#> 3 Alter Gemeindeplatz LINESTRING (7.532275 51.9556, 7.533447 51.95562, 7.53…
#> 4 Am Meckelbach MULTILINESTRING ((7.530726 51.94829, 7.531044 51.9486…
#> 5 Am Rohrbusch LINESTRING (7.541581 51.94878, 7.54253 51.94911)
#> 6 An der Kleikuhle LINESTRING (7.529509 51.9569, 7.530009 51.95695, 7.53…
#> 7 Anna-Peuler-Weg MULTILINESTRING ((7.531971 51.95867, 7.532265 51.9587…
#> 8 Auf dem Dorn MULTILINESTRING ((7.531428 51.95399, 7.531267 51.9542…
#> 9 Bertolt-Brecht-Strasse MULTILINESTRING ((7.536333 51.94844, 7.536385 51.9481…
#> 10 Bredeheide MULTILINESTRING ((7.53881 51.95679, 7.540002 51.95732…
#> # ℹ 65 more rows
# text on every geometry
ggplot(streets_union, aes(label = name)) +
geom_sf(aes(color = name), linewidth = 1.5) +
geom_sf_text(size = 3, alpha = .8) +
theme_void() +
theme(legend.position = "none")
#> Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
#> give correct results for longitude/latitude data
创建于 2023-07-25,使用 reprex v2.0.2