由于我是data.table包的新手,我想将通常在下面的data.frame结构中执行的操作复制到data.table结构中。
Dta <- data.frame(Customer = c("Javier","Oscar","Ivan","Peter"),Type_of_Customer=LETTERS[c(1,1:3)])
Dtb <- data.frame(Customer = c("Javier","Oscar","Ivan","Jack"),Zone=5:8,District=100:103)
Result <- cbind(Dtb[match(Dtb[,"Customer"],Dta[,"Customer"]),c("Zone","District")],Dta)
ww <- which(is.na(Result[,"Zone"]))
if(length(ww) > 0){
Result[ww,"Zone"] <- "Not in Dtb"
}
ww <- which(is.na(Result[,"District"]))
if(length(ww) > 0){
Result[ww,"District"] <- "Not in Dtb"
}
因此,如果我将Dta和Dtb作为data.table结构,那将是什么路呢?(注意:在实际示例中,我大约有1000万行,因此我需要更省时的解决方案)
Dta <- data.table(Custumer = c("Javier","Oscar","Ivan","Peter"),Type_of_Customer=LETTERS[c(1,1:3)])
Dtb <- data.table(Custumer = c("Javier","Oscar","Ivan","Jack"),Zone=5:8,District=100:103)
我们可以使用联接on
thee'Custumer'并将NA
元素替换为'Not in'Dtb'字符串
Dtb[Dta, on = .(Custumer)][, c("Zone", "District") :=
.(as.character(Zone), as.character(District))
][is.na(Zone), c("Zone", "District") := "Not in Dtb"][]