确定行之间的差异data.table,创造新柱说的差异是什么

问题描述 投票:1回答:1

我有一个数据集,看起来喜欢这样的:

Data01 <- data.table(
  code=c("A111", "A111","A111","A111","A111", "A111","A111","A234", "A234","A234","A234","A234", "A234","A234"),
  x=c("",126,126,"",836,843,843,126,126,"",127,836,843,843), 
  y=c("",76,76,"",456,465,465,76,76,"",77,456,465,465),
  no1=c(028756, 028756,028756,057756, 057756, 057756, 057756,028756, 028756,057756,057756, 057756, 057756, 057756),
  no2=c("","",034756,"","","",789165,"",034756,"","","","",789165)
)

Data01[, version := paste0("V", 1:.N), by = code]
Data01[, unique_version := paste(code, version, sep = "_")]

我想是补充说,对于每个唯一code词条中说每一行与前一个之间的区别是什么一列的方式(即粘贴列名(一个或多个),其中现在有不同的值)。事情是这样的:

Data01[, change := c("First_entry","New_x_and_y","New_no2","New_x_and_y_and_no_1","New_x_and_y","New_x_and_y","New_no2","First_entry","New_no2","New_x_and_y_and_no1","New_x_and_y","New_x_and_y","New_x_and_y","New_no2")]

我的实际数据集有550万行,以及360万唯一code条目,所以我会想象的任何解决方案,这将需要一段时间才能完成。这将因此真正的帮助,包括某种形式的进度指示器(喜欢的事,这里建议:Progress bar in data.table aggregate action)如果可能的话。

r data.table
1个回答
1
投票

你可以尝试这样的事情

nm <- c("x","y","no1","no2") #names(Data01)[-1L]
Data01[, change := c("First_entry", 
        sapply(seq_len(.N)[-1L], function(n) {
            paste(c("New", 
                nm[which(unlist(.SD[n-1L]) != unlist(.SD[n]))]), 
                collapse="_")
        })), 
    by=.(code)]
© www.soinside.com 2019 - 2024. All rights reserved.