读取多个文件，提取某些列，删除某些行，并写入新文件

Question

我有成千上万个扩展名为.txt的文件，并在公共文件夹中使用空格（“”）作为分隔符。我需要：

提取某些列。我需要删除最后一列，例如仅选择列1,2,3和7。我已经用循环编写了此代码：

    # Setting working directory
    workingdirectory <- "D:/FolderContainsThousandsFile"
    setwd(workingdirectory)

    # Listing the files in the folder with .txt extension
    FilesList <- list.files(workingdirectory, pattern = ".txt$")
    numberFiles <- length(FilesList)

    # Looping for all files
    for(f in 1:numberFiles){
    # read the file into tables
    FilterFile <- FilesList [f] %>% read.csv(sep = "", header = FALSE, stringsAsFactors = FALSE) %>% dplyr::select(-ncol(.)) # remove the last column

删除特定行。该文件包含几年的每日天气数据，然后我需要使用以下代码删除2月29日的所有数据：

    # Remove the 29th day in February
    columnNames <- c("year", "month", "day", "weather")
    FilterFile <- FilterFile %>% rename_at(c(1,2,3,7), ~columnNames) # renaming columns to indicate the column to be taken
    FilterFile <- FilterFile %>% filter(month != 2 | day != 29)

最后，我需要从第1点和第2点导出结果，以使其成为所有文件中的唯一.txt文件，并根据原始文件将新文件的名称（例如：before_file1.txt转换为after_file1.txt ）的每个文件。

我做对了吗？如果您知道执行此操作的每个步骤，请提供帮助。

提前谢谢您

Answer 1

正如托马斯·罗莎（Thomas Rosa）所指出的，到目前为止，最好有更多有关您的文件，目标和方法的详细信息...

但是，您可能正在寻找的代码可能与此类似：

cl <- c(1, 3, 5, 7)   # Columns you want
rw <- c(2, 5)         # Rows you do not want     

files <- list.files(path = 'your directory')    # List the thousands of files

for(file in files) {
  temporal <- read.table(file = file, sep = "") # You may want to skip 
                                                # some rows or include headers
  temporal <- temporal[, cl]   # Just use the columns you want
  temporal <- temporal[-rw, ]  # Delete the undesired rows
  write.table(x = temporal, file = paste0('after_', file))
}

注意：我不建议您在文件名中使用点，因为如果您的操作系统认为它是扩展文件指示符，则可能会遇到麻烦。

读取多个文件，提取某些列，删除某些行，并写入新文件

问题描述投票：0回答：1

1个回答

最新问题

读取多个文件，提取某些列，删除某些行，并写入新文件

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1