使用逐步线性回归找到最佳模型

问题描述 投票:0回答:1

我有一个

sf
对象,有 2 列,名为 fclassgeometry
sf
对象的列示例如下所示:

 fclass                       geometry
    1     secondary MULTILINESTRING ((598675.9 ...
    2     secondary MULTILINESTRING ((600641.7 ...
    3   residential MULTILINESTRING ((601734.8 ...
    4   residential MULTILINESTRING ((601163.9 ...
    5   residential MULTILINESTRING ((601104.2 ...
    6 motorway_link MULTILINESTRING ((609432.8 ...

我对fclass专栏感兴趣。 fclass 有几个级别。基本上,级别是道路网络的类别。级别如下所示:

unique(v$fclass)
 [1] "secondary"      "residential"    "motorway_link"  "service"        "primary"        "unclassified"   "motorway"      
 [8] "tertiary"       "trunk"          "primary_link"   "footway"        "track"          "secondary_link" "unknown"       
[15] "living_street"  "pedestrian"     "path"           "bridleway"      "steps"          "trunk_link"     "track_grade1"  
[22] "track_grade3"   "track_grade5"   "cycleway"       "track_grade4"   "tertiary_link"  "track_grade2"

我正在使用

sf
对象来预测另一个栅格,称为 ntl。我感兴趣的是,当我尝试预测 ntl 时,道路网络的级别(类别)组合会产生线性模型中最大的 r 平方。因此,我使用 fclass 列(其中包含道路网络的类别)来查找可以更好地预测我的响应变量的“最佳”级别组合。

因此,我使用第二个名为 pop 的栅格(我用它来提取扩展名和像素大小)将这个

sf
对象转换为
spatraster
,我对其进行重新采样(以匹配我的响应的像素大小)并执行线性回归。

我的目标是使用逐步线性回归和向后消除来找到哪些唯一值在 lm 模型中给出最大的 r 平方。我最初的尝试是:

library(pacman)
pacman::p_load(terra, sf, dplyr)

ntl <- rast("path/ntl.tif") # response

v <- st_read("path/road17.shp") # sf object

# baseline r2
vterra <- vect(v)

ref <- rast("path/pop.tif") # get ext and pixel size

r <- rast(v, res = res(ref), ext = ext(ref))

x <- rasterizeGeom(vterra, r, "length", "m") # calculate road density raster

x_res <- resample(x, ntl, "average") # resample the road raster

s <- c(ntl, x_res) # raster stack
names(s) <- c("ntl", "road")

m <- lm(ntl ~ road, s, na.action = na.exclude) # linear model using all the classes of the road network
baseline <- sqrt(summary(m)$adj.r.squared)
orig_baseline <- sqrt(summary(m)$adj.r.squared)

classes <- unique(v$fclass)
inclasses <- unique(v$fclass)

i <- 1
while (i <= length(classes)) {
  class <- classes[i]
  print(paste0("current class ", class))
  print(paste0("orig baseline: ", orig_baseline, " - baseline: ", baseline))
  print(classes)
  
  v_filtered <- v[v$fclass != class, ]
  vterra <- vect(v_filtered)
  r <- rast(v, res = res(ref), ext = ext(ref))
  x <- rasterizeGeom(vterra, r, "length", "m")
  
  x_res <- resample(x, ntl, "average")
  
  s <- c(ntl, x_res)
  names(s) <- c("ntl", "road")
  
  m <- lm(ntl ~ road, s, na.action = na.exclude)
  class_r2 <- sqrt(summary(m)$adj.r.squared)
  
  if (class_r2 > baseline) {
    classes <- classes[-i]
    baseline <- class_r2
  } else {
    print("class_r2 is less than baseline")
    print(paste0("class_r2: ", class_r2, " - baseline: ", baseline))
    i <- i + 1
  }
}

但效率不高(大约需要 5 分钟)。

数据集可以从GoogleDrive下载。

r linear-regression piecewise
1个回答
0
投票

你的循环。工作正常 如果有效那就不傻了

© www.soinside.com 2019 - 2024. All rights reserved.