将多个 Excel 读入 R，其中数据不在 A1 单元格中开始

Question

我有许多 Excel 工作簿，其中包含单元格 B10:N13 中的数据，如下所示

我想从我的文件路径中读取所有 Excel 文件，并从 B10:N10 中获取列名称，并从 B13:N13 中获取％阳性数据。我还想使用 B2 单元格中的日期变量创建一个日期字段。然后我将附加数据以得到一个看起来像这样的数据框（显然是 R 数据框，而不是 Excel 表格，只是使用 Excel 来演示）

我知道如何从我的文件路径获取所有 Excel 的列表

# Define the folder path containing Excel files
folder_path <- "O:/Surveillance/3. Dissemination/Reports/Resp data/Respiratory data/"

# Get a list of Excel files in the folder
file_list <- list.files(path = folder_path, pattern = '*.xlsx', full.names = TRUE)

但我不知道如何编写一个函数来循环所有 Excel 并提取这些信息并将它们绑定在一起。我确实找到了一种混乱的方法来仅用一个 Excel 来提取我想要的信息，但我确信有一种更简洁的方法可以做到这一点，然后将其放入函数/循环中。

# Read in data from table in Excel
test <- read_excel("O:/Surveillance/3. Dissemination/Reports/Resp data/Respiratory data/Respiratory summary WE 2022-07-03.xlsx", 
                   sheet = 'Summary', 
                   range = 'B10:N13')

# Keep only the % Positive row
test_filtered <- test[3, ]

# Extract the date from cell B1
date_cell <- read_excel("O:/Surveillance/3. Dissemination/Reports/Resp data/Respiratory data/Respiratory summary WE 2022-07-03.xlsx",
                        sheet = 'Summary',
                        range = 'B1:B1',
                        col_names = FALSE)

date <- date_cell[1, 1]  

# Add date column to other data and rename
test_filtered <- cbind(date = date, test_filtered) |> 
  rename('date' = '...1') 

# Extract date portion of string
test_filtered <- test_filtered %>%
  mutate(date = sub("^.*\\b(\\d{2}-\\d{2}-\\d{4})$", "\\1", date))

希望得到一些指导。请友善，我还在学习:)

Answer 1

这就是我使用

purrr::map()

和

dplyr

包的方式。我以

readxl

包中的一些文件为例。

map()

允许对列表（文件名列表）的每个元素 (

xl

) 重复相同的操作。这些操作是

read_excel()

和

dplyr::mutate()

特定单元格（添加列）。数据框列表与

purrr::list_rbind()

合并

我认为你将能够用它来调整你的代码:)

library(readxl)
library(tidyverse)

# Listing files
# Examples from the readxl package, could be from a specific folder. Choosing 3
list_of_xl= list.files( system.file("extdata",package="readxl"), full.names = TRUE )[c(2,4,6)]
# Alternative in a more realistic case:
#list_of_xl = list.files("MyExcelFolder", full.names = TRUE)

merged = list_of_xl %>% 
  map( \(xl) read_excel(xl) %>%
         #Adding  the cell A2 as a column of every file. Specifying "text" for an easy bind
         mutate(A2Cell = read_excel(xl, range="A2:A2", col_names = FALSE , col_types = "text")) 
  ) %>% 
  list_rbind()
#> New names:
#> New names:
#> New names:
#> New names:
#> • `` -> `...1`

merged
#> # A tibble: 172 × 14
#>    name      value A2Cell$...1 Sepal.Length Sepal.Width Petal.Length Petal.Width
#>    <chr>     <chr> <chr>              <dbl>       <dbl>        <dbl>       <dbl>
#>  1 Name      Clip… Name                NA          NA           NA          NA  
#>  2 Species   pape… Name                NA          NA           NA          NA  
#>  3 Approx d… 39083 Name                NA          NA           NA          NA  
#>  4 Weight i… 0.9   Name                NA          NA           NA          NA  
#>  5 <NA>      <NA>  5.1                  5.1         3.5          1.4         0.2
#>  6 <NA>      <NA>  5.1                  4.9         3            1.4         0.2
#>  7 <NA>      <NA>  5.1                  4.7         3.2          1.3         0.2
#>  8 <NA>      <NA>  5.1                  4.6         3.1          1.5         0.2
#>  9 <NA>      <NA>  5.1                  5           3.6          1.4         0.2
#> 10 <NA>      <NA>  5.1                  5.4         3.9          1.7         0.4
#> # ℹ 162 more rows
#> # ℹ 7 more variables: Species <chr>, `Lots of people` <chr>, ...2 <chr>,
#> #   ...3 <chr>, ...4 <chr>, ...5 <chr>, ...6 <chr>

^{创建于 2024-04-18，使用 reprex v2.1.0}

将多个 Excel 读入 R，其中数据不在 A1 单元格中开始

问题描述投票：0回答：1

1个回答

最新问题

将多个 Excel 读入 R，其中数据不在 A1 单元格中开始

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1