在data.table中使用带有.SD的na.locf函数(zoo包)

问题描述 投票:3回答:3

我试图填写所有NA,排除第1和第4列的前两个NA,以及第2和第3列的3个NA,最近的非NA值。这是我的数据和代码:

  hh<-structure(list(ka = c(NA, NA, 2, NA, NA, 3, NA, NA, NA, NA), 
        kb = c(NA, NA, NA, 2, NA, NA, 3, NA, NA, NA), gc = c(NA, 
        NA, NA, 3, NA, NA, 6, NA, NA, NA), hc = c(NA, NA, 8, NA, 
        NA, NA, 4, NA, NA, NA)), .Names = c("ka", "kb", "gc", "hc"
    ), row.names = c(NA, -10L), class = "data.frame")


library(zoo) #na.locf
library(data.table)

setDT(hh)[,`:=`(ka=c(NA,NA,na.locf(ka)),kb=c(NA,NA,NA,na.locf(kb)),gc=c(NA,NA,NA,na.locf(gc)),hc=c(NA,NA,na.locf(hc)))][]
    ka kb gc hc
 1: NA NA NA NA
 2: NA NA NA NA
 3:  2 NA NA  8
 4:  2  2  3  8
 5:  2  2  3  8
 6:  3  2  3  8
 7:  3  3  6  4
 8:  3  3  6  4
 9:  3  3  6  4
10:  3  3  6  4

但是,我正在寻找使用lapply.SD,因为每种类型都有两列以上。这可能吗?

r data.table zoo
3个回答
7
投票

尝试

 setDT(hh)[, lapply(.SD, function(x) na.locf(x, na.rm=FALSE))]

或者使用set

  for(j in seq_along(hh)){
    set(hh, i=NULL, j=j, value= na.locf(hh[[j]], na.rm=FALSE))
  }

0
投票

您可以使用setnafill中提供的development version 1.12.3

setnafill(hh, type = "locf")
hh
#    ka kb gc hc
#  1 NA NA NA NA
#  2 NA NA NA NA
#  3  2 NA NA  8
#  4  2  2  3  8
#  5  2  2  3  8
#  6  3  2  3  8
#  7  3  3  6  4
#  8  3  3  6  4
#  9  3  3  6  4
# 10  3  3  6  4

0
投票

你不需要lapply。这就足够了:

DT <- as.data.table(hh)
DT[, na.locf(.SD, na.rm = FALSE)]

赠送:

    ka kb gc hc
 1: NA NA NA NA
 2: NA NA NA NA
 3:  2 NA NA  8
 4:  2  2  3  8
 5:  2  2  3  8
 6:  3  2  3  8
 7:  3  3  6  4
 8:  3  3  6  4
 9:  3  3  6  4
10:  3  3  6  4

这也有效:

DT[, lapply(.SD, na.locf0)]
© www.soinside.com 2019 - 2024. All rights reserved.