是否有一种方法可以跳过R的前X行,而在R中打开.csv,其中X是根据可以找到指定标头的位置而变化的?

问题描述 投票:0回答:2

我正在尝试读取一个文件夹中的多个.csv文件,并将所有数据组合到一个数据框中以进行分析和图形化。通常,我将使用这种方法来加载和合并所有文件。

    file_list <- list.files(paste(WorkingDirectory, "/Transducer Data", sep= ""), pattern = "*.csv", 
    full.names = TRUE)

    for (file in file_list){
       all_transducer_file <- read.csv(file, header = F, as.is = T, sep= ",", skip = 15) 
     }

但是,我遇到了两个问题。1.生成的.csv在数据之前具有不同的行数。数据的标题始终为:“日期和时间”,“秒”,“压力(PSI)”和“地表水位(ft)”。自从上次数据提取以来,设备引发的错误数量取决于行数。2.数据有时加载为“ chr”类型,有时加载为“ factor”类型。我不太了解两者之间的区别,也不了解这可能如何影响编码。

是否有一种方法可以跳过前X行来打开csv,其中X基于可以找到指定标头的位置?

谢谢!梅尔

r read.csv
2个回答
1
投票

由于您知道Date and Time出现在标题中,请尝试以下操作:

library(data.table)
fread(filename, skip = "Date and Time")

请参阅?fread以获取您可能需要或不需要的其他参数。


0
投票

所以这是解决当前问题的方法;

问题和解决方案:

  1. 不知道跳过从何处开始->使用grep获取以列名开头的行
  2. 某些列成为要素,某些字符->使用read_csv或在read.csv中设置stringAsFactors = FALSE

获取文件名并跳过行

# Setting the file path which contains the csv data
file_list <- 
  list.files(paste(WorkingDirectory, "/Transducer Data", sep= ""), pattern = "*.csv", 
             full.names = TRUE)

# Here we get the line at which the table we want starts
# sapply is used to loop on each file we have
# grep("Date and Time", readr::read_lines(x))[1] -> reads lines of data and get row at which Date time exist
# We minus this row by one to use it as skip number
skip_lines <- 
  sapply(file_list, function(x){grep("Date and Time", readr::read_lines(x))[1] - 1}, 
         USE.NAMES = FALSE)

读取数据

# Here I am using purrr to loop on data but you can use
# a normal loop or apply family, the benefit of map_df (function in purrr)
# is that it automatically returns data as a dataframe without needing to bind it
library(purrr)

# Method one using read.csv
1:length(file_list) %>% # I am looping on the files
  map_df(function(x){
    # For each file we read it skipping number of rows in skip_lines vector
    # stringsAsFactors = FALSE -> to avoid conversion of any column to factor (both character and factor will be character)
    read.csv(file_list[x], skip = skip_lines[x], stringsAsFactors = FALSE)
  })

# Method two using read_csv
1:length(file_list) %>%
  map_df(function(x){
    readr::read_csv(file_list[x], skip = skip_lines[x], col_types = cols())
  })
© www.soinside.com 2019 - 2024. All rights reserved.