将混合日期时间转换为 POSIXct

问题描述 投票:0回答:1

我有数百个包含多列的 Excel 文件,这些列具有混合的日期时间格式,包括小数。有些变量名在文件之间共享,有些则不是。我需要将所有这些数据集绑定在一起,并确保每个日期时间列(当前正在作为字符读取)是 POSIXct。

样本数据和目标:

df1 <- tibble::tribble(
               ~v1, ~v2,              ~v3, ~v4,
  "3/8/2022 22:59", "A",    "44629.85139", "E",
    "3/9/22 19:24", "B",    "44629.88056", "F",
     "44629.81389", "C", "3/8/2022 20:25", "G",
     "44629.81389", "D", "3/8/2022 20:31", "H"
  )

df2 <- tibble::tribble(
               ~v1, ~v2,              ~v5, ~v6,
  "3/8/2022 22:59", "I",    "44629.85139", "M",
     "44629.81389", "J", "3/7/2022 21:38", "N",
     "44629.81389", "K", "3/7/2022 21:44", "O",
     "44629.81389", "L", "3/7/2022 21:48", "P"
  )

goal <- tibble::tribble(
                    ~v1, ~v2,                   ~v3, ~v4,                   ~v5, ~v6,
  "2022-03-08 22:59:00", "A", "2022-03-09 20:26:00", "E",                    NA,  NA,
  "2022-03-09 19:24:00", "B", "2022-03-09 21:08:00", "F",                    NA,  NA,
  "2022-03-09 19:32:00", "C", "2022-03-08 20:25:00", "G",                    NA,  NA,
  "2022-03-09 19:32:00", "D", "2022-03-08 20:31:00", "H",                    NA,  NA,
  "2022-03-08 22:59:00", "I",                    NA,  NA, "2022-03-09 20:26:00", "M",
  "2022-03-09 19:32:00", "J",                    NA,  NA, "2022-03-07 21:38:00", "N",
  "2022-03-09 19:32:00", "K",                    NA,  NA, "2022-03-07 21:44:00", "O",
  "2022-03-09 19:32:00", "L",                    NA,  NA, "2022-03-07 21:48:00", "P"
  )

解决方案 here 将小数时间转换为各个列的正确格式,但当然不适用于其他格式:

> as.POSIXct("1900-01-01") + as.difftime(as.numeric(df1$v1), units = "days")
[1] NA                       
[2] NA                       
[3] "2022-03-11 19:32:00 EST"
[4] "2022-03-11 19:32:00 EST"
Warning message:
In as.difftime(as.numeric(df1$v1), units = "days") :  NAs introduced by coercion

当变量跨文件不同时,如何为每个数据集转换多个混合格式的日期时间列?我尝试了

lubridate::parse_date_time
但无法弄清楚如何将十进制转换适合
orders
参数。非常感谢任何帮助!

r datetime tidyverse lubridate
1个回答
0
投票
library(dplyr)
library(lubridate)

bind_rows(df1, df2) %>% 
  rowwise() %>% 
  mutate(across(c("v1", "v3", "v5"), 
                ~pmax(mdy_hm(.x),
                      ymd_hms(.x),
                      as.POSIXct(as.numeric(.x), origin="1899-12-30", tz="GMT"),
                      na.rm = TRUE)))

#> # A tibble: 8 x 6
#> # Rowwise: 
#>   v1                  v2    v3                  v4    v5                  v6   
#>   <dttm>              <chr> <dttm>              <chr> <dttm>              <chr>
#> 1 2022-03-08 22:59:00 A     1899-12-30 12:23:50 E     NA                  <NA> 
#> 2 2022-03-09 19:24:00 B     1899-12-30 12:23:50 F     NA                  <NA> 
#> 3 1899-12-30 12:23:50 C     2022-03-08 20:25:00 G     NA                  <NA> 
#> 4 1899-12-30 12:23:50 D     2022-03-08 20:31:00 H     NA                  <NA> 
#> 5 2022-03-08 22:59:00 I     NA                  <NA>  1899-12-30 12:23:50 M    
#> 6 1899-12-30 12:23:50 J     NA                  <NA>  2022-03-07 21:38:00 N    
#> 7 1899-12-30 12:23:50 K     NA                  <NA>  2022-03-07 21:44:00 O    
#> 8 1899-12-30 12:23:50 L     NA                  <NA>  2022-03-07 21:48:00 P

reprex 包 (v2.0.1) 于 2023-03-23 创建

© www.soinside.com 2019 - 2024. All rights reserved.