是否有一种方法可以跳过R的前X行，而在R中打开.csv，其中X是根据可以找到指定标头的位置而变化的？

Question

我正在尝试读取一个文件夹中的多个.csv文件，并将所有数据组合到一个数据框中以进行分析和图形化。通常，我将使用这种方法来加载和合并所有文件。

    file_list <- list.files(paste(WorkingDirectory, "/Transducer Data", sep= ""), pattern = "*.csv", 
    full.names = TRUE)

    for (file in file_list){
       all_transducer_file <- read.csv(file, header = F, as.is = T, sep= ",", skip = 15) 
     }

但是，我遇到了两个问题。1.生成的.csv在数据之前具有不同的行数。数据的标题始终为：“日期和时间”，“秒”，“压力（PSI）”和“地表水位（ft）”。自从上次数据提取以来，设备引发的错误数量取决于行数。2.数据有时加载为“ chr”类型，有时加载为“ factor”类型。我不太了解两者之间的区别，也不了解这可能如何影响编码。

是否有一种方法可以跳过前X行来打开csv，其中X基于可以找到指定标头的位置？

谢谢！梅尔

Answer 1

由于您知道Date and Time出现在标题中，请尝试以下操作：

library(data.table)
fread(filename, skip = "Date and Time")

请参阅?fread以获取您可能需要或不需要的其他参数。

Answer 2

所以这是解决当前问题的方法；

问题和解决方案：

不知道跳过从何处开始->使用grep获取以列名开头的行
某些列成为要素，某些字符->使用read_csv或在read.csv中设置stringAsFactors = FALSE

获取文件名并跳过行

# Setting the file path which contains the csv data
file_list <- 
  list.files(paste(WorkingDirectory, "/Transducer Data", sep= ""), pattern = "*.csv", 
             full.names = TRUE)

# Here we get the line at which the table we want starts
# sapply is used to loop on each file we have
# grep("Date and Time", readr::read_lines(x))[1] -> reads lines of data and get row at which Date time exist
# We minus this row by one to use it as skip number
skip_lines <- 
  sapply(file_list, function(x){grep("Date and Time", readr::read_lines(x))[1] - 1}, 
         USE.NAMES = FALSE)

读取数据

# Here I am using purrr to loop on data but you can use
# a normal loop or apply family, the benefit of map_df (function in purrr)
# is that it automatically returns data as a dataframe without needing to bind it
library(purrr)

# Method one using read.csv
1:length(file_list) %>% # I am looping on the files
  map_df(function(x){
    # For each file we read it skipping number of rows in skip_lines vector
    # stringsAsFactors = FALSE -> to avoid conversion of any column to factor (both character and factor will be character)
    read.csv(file_list[x], skip = skip_lines[x], stringsAsFactors = FALSE)
  })

# Method two using read_csv
1:length(file_list) %>%
  map_df(function(x){
    readr::read_csv(file_list[x], skip = skip_lines[x], col_types = cols())
  })

是否有一种方法可以跳过R的前X行，而在R中打开.csv，其中X是根据可以找到指定标头的位置而变化的？

问题描述投票：0回答：2

2个回答

获取文件名并跳过行

读取数据

最新问题

是否有一种方法可以跳过R的前X行，而在R中打开.csv，其中X是根据可以找到指定标头的位置而变化的？

问题描述 投票：0回答：2

2个回答

获取文件名并跳过行

读取数据

最新问题

问题描述投票：0回答：2