R 使用循环导入多个数据文件并将它们全部保存为单独的数据框

问题描述 投票:0回答:1

我有 4 个 .txt 文件(全部格式为逗号分隔),我想通过循环将每个文件作为自己的数据帧导入到 R 中。我还尝试进行一些数据格式化并添加一列来识别数据帧来自的文件。我遇到了一些障碍,需要帮助。这是我迄今为止所做的此类作品:

input_all_data<- 'Filepath'
dflist<- list.files(path = input_all_data,      
                    pattern = ".txt", full.names = TRUE)
myfiles = lapply(dflist, read.delim, header = FALSE)
for (i in length(myfiles)) {
  
  myfiles<- myfiles %>%
    map_dfr(read_csv,
            col_names = c('Frequency', 'Impedance', 'Admittance', 'Phase', 
                          'Resistance', 'Reactance', 'Conductance', 'Susceptance'), col_types = cols(
                            col_number(),
                            col_number(),
                            col_number(),
                            col_number(),
                            col_number(),
                            col_number(),
                            col_number(),
                            col_number()), skip= 5,  .id = 'Filename') %>%
    na.omit()
  
}

此脚本仅生成一个具有通用名称的数据帧,它实际上也不对其进行格式化。

这是一个脚本,可以执行我想要的操作,但一次只能处理一个文件。

input_all_data<- 'Filepath'

dflist<- list.files(path = input_all_data,    
                        pattern = ".txt", full.names = TRUE) %>% 
  map_dfr(read_csv,
                   col_names = c('Frequency', 'Impedance', 'Admittance', 'Phase', 
                                 'Resistance', 'Reactance', 'Conductance', 'Susceptance'), col_types = cols(
                                   col_number(),
                                   col_number(),
                                   col_number(),
                                   col_number(),
                                   col_number(),
                                   col_number(),
                                   col_number(),
                                   col_number()), skip= 5, .id = 'Filename') %>%
  na.omit()

所有四个文件的格式相同。文件名如下“UW_APL_ES120-7CD_SN_###”或“US_APL_ES200-7CD_SN###”,每个文件有两个。

代表数据:

structure(list(Filename = c("1", "1", "1", "1", "2", "2", "2", 
"2", "3", "3", "3", "3", "4", "4", "4", "4"), Frequency = c(109999.953, 
110050.086, 110100.148, 110150.281, 110200.461, 110250.594, 110300.656, 
110350.789, 110400.977, 110451.039, 110501.172, 110551.297, 110601.484, 
110651.555, 110701.68, 110751.75), Impedance = c(69.4809113, 
69.9489136, 70.1840897, 69.4809113, 69.016037, 69.016037, 69.016037, 
69.016037, 69.016037, 68.7847672, 68.3245621, 68.3245621, 68.5542831, 
68.5542831, 68.0956116, 67.4133606), Admittance = c(0.0143924422, 
0.0142961477, 0.0142482435, 0.0143924422, 0.014489386, 0.014489386, 
0.014489386, 0.014489386, 0.014489386, 0.0145381026, 0.014636025, 
0.014636025, 0.0145869806, 0.0145869806, 0.0146852341, 0.0148338548
), Phase = c(-3.3277061, -3.19584394, -3.23976398, -3.415519, 
-3.19576693, -2.80023408, -2.53653598, -2.44862008, -2.71226692, 
-2.62435102, -2.31670809, -1.96511996, -1.83325803, -1.96506906, 
-1.96504295, -1.74529099), Resistance = c(69.363757, 69.8401298, 
70.0719203, 69.3574942, 68.9087092, 68.9336275, 68.9484153, 68.9530209, 
68.9387229, 68.7126257, 68.268717, 68.2843796, 68.5191943, 68.5139676, 
68.0555668, 67.3820874), Reactance = c(-4.03314324, -3.89958764, 
-3.96641324, -4.13944704, -3.84748801, -3.37169914, -3.05440442, 
-2.94860491, -3.26586027, -3.14948611, -2.76189519, -2.34292368, 
-2.19311524, -2.35074019, -2.33498124, -2.05316583), Conductance = c(0.0143681746, 
0.0142739145, 0.0142254717, 0.0143668773, 0.0144668533, 0.0144720848, 
0.0144751893, 0.0144761563, 0.0144731545, 0.014522855, 0.0146240623, 
0.0146274174, 0.0145795144, 0.0145784023, 0.0146765983, 0.0148269733
), Susceptance = c(0.000835434941, 0.000796997095, 0.000805231242, 
0.000857454964, 0.000807750508, 0.000707862296, 0.000641248709, 
0.000619036916, 0.000685642239, 0.000665664128, 0.000591634484, 
0.000501885245, 0.000466651069, 0.000500190507, 0.000503552953, 
0.000451785277)), row.names = c(NA, -16L), class = c("tbl_df", 
"tbl", "data.frame"), na.action = structure(c(`401` = 401L, `402` = 402L, 
`403` = 403L, `404` = 404L, `405` = 405L, `806` = 806L, `807` = 807L, 
`808` = 808L, `809` = 809L, `810` = 810L, `1211` = 1211L, `1212` = 1212L, 
`1213` = 1213L, `1214` = 1214L, `1215` = 1215L, `2016` = 2016L, 
`2017` = 2017L, `2018` = 2018L, `2019` = 2019L, `2020` = 2020L, 
`2421` = 2421L, `2422` = 2422L, `2423` = 2423L, `2424` = 2424L, 
`2425` = 2425L, `2826` = 2826L, `2827` = 2827L, `2828` = 2828L, 
`2829` = 2829L, `2830` = 2830L, `3631` = 3631L, `3632` = 3632L, 
`3633` = 3633L, `3634` = 3634L, `3635` = 3635L, `4036` = 4036L, 
`4037` = 4037L, `4038` = 4038L, `4039` = 4039L, `4040` = 4040L, 
`4441` = 4441L, `4442` = 4442L, `4443` = 4443L, `4444` = 4444L, 
`4445` = 4445L, `5246` = 5246L, `5247` = 5247L, `5248` = 5248L, 
`5249` = 5249L, `5250` = 5250L, `5651` = 5651L, `5652` = 5652L, 
`5653` = 5653L, `5654` = 5654L, `5655` = 5655L, `6056` = 6056L, 
`6057` = 6057L, `6058` = 6058L, `6059` = 6059L, `6060` = 6060L
), class = "omit"))

我已经尝试了一些循环,但无法让循环执行我正在寻找的将文件分隔成自己的数据帧、引用源或数据格式的操作。我想我已经完成了 70%,但需要帮助才能正确循环。

r dplyr tidyverse rstudio
1个回答
0
投票

您想要做的是将文件名放入列表中,为数据帧创建一个空列表。然后循环遍历文件名,将 read.csv 结果附加到数据帧列表中。

filenames <- list.files("/directory/")
dfs <- list()
for (file in filenames) {
    newdf <- read.csv(file)
    dfs[[length(dfs) + 1]] <- newdf
}

然后您可以通过列表访问您的数据框。

firstdataframe <- dfs[[1]]
© www.soinside.com 2019 - 2024. All rights reserved.