我有一个
sf
对象,有 2 列,名为 fclass 和 geometry。 sf
对象的列示例如下所示:
fclass geometry
1 secondary MULTILINESTRING ((598675.9 ...
2 secondary MULTILINESTRING ((600641.7 ...
3 residential MULTILINESTRING ((601734.8 ...
4 residential MULTILINESTRING ((601163.9 ...
5 residential MULTILINESTRING ((601104.2 ...
6 motorway_link MULTILINESTRING ((609432.8 ...
我对fclass专栏感兴趣。 fclass 有几个级别。基本上,级别是道路网络的类别。级别如下所示:
unique(v$fclass)
[1] "secondary" "residential" "motorway_link" "service" "primary" "unclassified" "motorway"
[8] "tertiary" "trunk" "primary_link" "footway" "track" "secondary_link" "unknown"
[15] "living_street" "pedestrian" "path" "bridleway" "steps" "trunk_link" "track_grade1"
[22] "track_grade3" "track_grade5" "cycleway" "track_grade4" "tertiary_link" "track_grade2"
我正在使用
sf
对象来预测另一个栅格,称为 ntl。我感兴趣的是,当我尝试预测 ntl 时,道路网络的级别(类别)组合会产生线性模型中最大的 r 平方。因此,我使用 fclass 列(其中包含道路网络的类别)来查找可以更好地预测我的响应变量的“最佳”级别组合。
因此,我使用第二个名为 pop 的栅格(我用它来提取扩展名和像素大小)将这个
sf
对象转换为 spatraster
,我对其进行重新采样(以匹配我的响应的像素大小)并执行线性回归。
我的目标是使用逐步线性回归和向后消除来找到哪些唯一值在 lm 模型中给出最大的 r 平方。我最初的尝试是:
library(pacman)
pacman::p_load(terra, sf, dplyr)
ntl <- rast("path/ntl.tif") # response
v <- st_read("path/road17.shp") # sf object
# baseline r2
vterra <- vect(v)
ref <- rast("path/pop.tif") # get ext and pixel size
r <- rast(v, res = res(ref), ext = ext(ref))
x <- rasterizeGeom(vterra, r, "length", "m") # calculate road density raster
x_res <- resample(x, ntl, "average") # resample the road raster
s <- c(ntl, x_res) # raster stack
names(s) <- c("ntl", "road")
m <- lm(ntl ~ road, s, na.action = na.exclude) # linear model using all the classes of the road network
baseline <- sqrt(summary(m)$adj.r.squared)
orig_baseline <- sqrt(summary(m)$adj.r.squared)
classes <- unique(v$fclass)
inclasses <- unique(v$fclass)
i <- 1
while (i <= length(classes)) {
class <- classes[i]
print(paste0("current class ", class))
print(paste0("orig baseline: ", orig_baseline, " - baseline: ", baseline))
print(classes)
v_filtered <- v[v$fclass != class, ]
vterra <- vect(v_filtered)
r <- rast(v, res = res(ref), ext = ext(ref))
x <- rasterizeGeom(vterra, r, "length", "m")
x_res <- resample(x, ntl, "average")
s <- c(ntl, x_res)
names(s) <- c("ntl", "road")
m <- lm(ntl ~ road, s, na.action = na.exclude)
class_r2 <- sqrt(summary(m)$adj.r.squared)
if (class_r2 > baseline) {
classes <- classes[-i]
baseline <- class_r2
} else {
print("class_r2 is less than baseline")
print(paste0("class_r2: ", class_r2, " - baseline: ", baseline))
i <- i + 1
}
}
但效率不高(大约需要 5 分钟)。
数据集可以从GoogleDrive下载。
你的循环。工作正常 如果有效那就不傻了