我正在尝试读取文本文件并创建一些特定列(大约12个)(以一定长度定位)的数据框(称为数据集),如下所示:
x <- fread("file1.txt",colClasses = "character", sep = "\n", header = FALSE, verbose = FALSE,strip.white = FALSE)
y <- fread("file2.txt",colClasses = "character", sep = "\n", header = FALSE, verbose = FALSE,strip.white = FALSE)
# combine them
x = rbind(x,y)
# We basically read the whole file as a string and then read substrings
# corresponding to each variable start and finish lengths.
Var1= sapply(as.list(x$V1), stri_sub, from = 80, to = 82)
Var1= as.data.frame(Var1)
Var2= sapply(as.list(x$V1), stri_sub, 83, 89)
Var2= as.data.frame(Var2)
dataset = cbind(Var1,Var2)
运行两个分别具有200K和300K行的文本文件大约需要1分钟。每行有1800个字符。有没有更快的方法来运行它?我将阅读大约200个此类文件。