具有大数据集的半径内的点数-R

Question

我有县包裹级别的shapefile，我的目的是计算一英里（约1610米）内的包裹数量，以及同一所有者。我已经完成了一个解决方案，下面是我的示例代码，但它效率很低，内存密集。我不能公开发布数据，但这是一些组成代码的问题：

library(rgdal)
library(rgeos)
library(geosphere)


nobs<-1000  # number of observations
nowners<-50 # number of different owners
crs<-"+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
long<-runif(nobs,min=-94.70073, max=-94.24141) #roughly adair county in iowa
lat<-runif(nobs,min=41.15712,max=41.50415) #roughly adair county in iowa
coords<-cbind(long,lat)
owner<-sample(1:nowners,nobs, replace=T) # give id's to owners 
df<-as.data.frame(owner)
centroids<-SpatialPointsDataFrame(coords,df,proj4string = CRS(crs)) # make spatial dataframe 

d<-distm(centroids) # distance from centroids to other centroid

numdif<-matrix(0,length(owner)) #vectors of 0s to be replaced in loop
numtot<-matrix(0,length(owner))
for (i in 1:length(owner)) {
  same_id<-df$owner[i]==owner ## identify locations with same owner ID 
  numdif[i]<-as.numeric(sum(d[i,]<1609.34 & same_id==F)) #different parcel owners
  numtot[i]<-as.numeric(sum(d[i,]<1609.34)) #total parcels
}

得到的“numdif”和“numtot”向量给出了我想要的东西：分别具有不同所有者和总数的相邻地块数量的向量。然而，对于拥有更大“nobs”的县来说，这个过程非常费时且占用大量内存。有些县有50-75,000个观测值（因此得到的矩阵m有数十亿个元素，并且可能需要比我更多的内存）。从速度和记忆的角度来看，有没有人想过更好地解决这个问题？非常感谢帮助。

Answer 1

您可以在申请中完成计数

d <- d < 1609.34
nt <- apply(d, 1, sum)
nd <- apply(d, 1, function(i) length(unique(owner[i]))) - 1

我认为你的numdif计算不正确，因为如果它们有多个包裹，它会多次包括其他所有者。

鉴于观察数量众多，我会考虑这条路线：

d <- lapply(1:nrow(coords), function(i) which(distGeo(coords[i, ,drop=FALSE], coords) < 1609.34))
ntot <- sapply(d, length)
ndif <- sapply(d, function(i) length(unique(owner[i]))) - 1

这比较慢，但它不会创建一个疯狂的大矩阵

我还应该补充一点，你的方法假设包裹相对于所考虑的距离较小，因此使用质心是可以的。如果不是这种情况，则可以使用rgeos::gWithinDistance对多边形进行计算，增加计算成本。

具有大数据集的半径内的点数-R

问题描述投票：0回答：1

1个回答

最新问题

具有大数据集的半径内的点数-R

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1