在 R 中根据时间组合两个数据帧

问题描述 投票:0回答:1

我有两个数据框(A 和 B),都有日期和时间列,但除此之外没有相互列。数据帧 B 包含的时间值少得多(A 和 B 只包含一天)。如何连接 A 和 B,以便 B 的所有行都“压缩”到数据帧 A 中?如果没有可用值,所有其他列应保留并显示“NA”。这是必要的,因为我想在构面包装中创建一个 ggplot。我尝试过加入、绑定行和 bayesbio 包中的函数 最近时间,但似乎没有一个适用于这种情况。

 Info on dataframe A:
 structure(list(time = structure(c(-2209038596.86, 
-2209038594.86, -2209038592.86, -2209038590.86, -2209038588.86, 
-2209038586.86, -2209038584.86, -2209038582.86, -2209038580.86, 
-2209038578.86), tzone = "UTC", class = c("POSIXct", "POSIXt"
)), pH = c(7.13642, 7.12116, 7.12116, 7.13133, 7.12625, 7.12625, 
7.12625, 7.13133, 7.11608, 7.12625), oxygen = c(6.13248, 
6.11996, 6.10745, 6.11996, 6.11996, 6.09493, 6.09493, 6.08242, 
6.08242, 6.08242)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

Info on dataframe B:
structure(list(time = structure(c(-2209038180, -2209035240, -2209034340, 
-2209033440, -2209031640), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), DOC = c(5001, 36787, 20835, 13085, 8344), Temp = c(22.2, 
20, 22.5, 22.6, 23), conductivity = c(2.16, 2.29, 2.37, 2.42, 
2.49), flow = c(15, 15, 15, 15, 15)), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

请注意,由于未知原因,所有时间都设置为 1899,这在 POSIXct 中意味着负数。

合并后,应该包含所有时间值(这太复杂而无法显示,因为在数据帧 B 的时间发生之前有数百行)并且还包含所有其他列。

这只是数据帧 B 所需的输出:

现在,应该在那里添加 pH 值和氧气。此外,如果 x 轴显示小时和分钟而不是一天中的实际时间,那就太好了。

r lubridate
1个回答
1
投票

简单的

merge()
有什么问题吗?

dtfa <- structure(list(time = structure(c(-2209038596.86, 
-2209038594.86, -2209038592.86, -2209038590.86, -2209038588.86, 
-2209038586.86, -2209038584.86, -2209038582.86, -2209038580.86, 
-2209038578.86), tzone = "UTC", class = c("POSIXct", "POSIXt"
)), pH = c(7.13642, 7.12116, 7.12116, 7.13133, 7.12625, 7.12625, 
7.12625, 7.13133, 7.11608, 7.12625), oxygen = c(6.13248, 
6.11996, 6.10745, 6.11996, 6.11996, 6.09493, 6.09493, 6.08242, 
6.08242, 6.08242)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

dtfb <- structure(list(time = structure(c(-2209038180, -2209035240, -2209034340, 
-2209033440, -2209031640), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), DOC = c(5001, 36787, 20835, 13085, 8344), Temp = c(22.2, 
20, 22.5, 22.6, 23), conductivity = c(2.16, 2.29, 2.37, 2.42, 
2.49), flow = c(15, 15, 15, 15, 15)), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

merge(dtfa, dtfb, all=TRUE)
#                   time      pH  oxygen   DOC Temp conductivity flow
# 1  1899-12-31 10:10:04 7.13642 6.13248    NA   NA           NA   NA
# 2  1899-12-31 10:10:06 7.12116 6.11996    NA   NA           NA   NA
# 3  1899-12-31 10:10:08 7.12116 6.10745    NA   NA           NA   NA
# 4  1899-12-31 10:10:10 7.13133 6.11996    NA   NA           NA   NA
# 5  1899-12-31 10:10:12 7.12625 6.11996    NA   NA           NA   NA
# 6  1899-12-31 10:10:14 7.12625 6.09493    NA   NA           NA   NA
# 7  1899-12-31 10:10:16 7.12625 6.09493    NA   NA           NA   NA
# 8  1899-12-31 10:10:18 7.13133 6.08242    NA   NA           NA   NA
# 9  1899-12-31 10:10:20 7.11608 6.08242    NA   NA           NA   NA
# 10 1899-12-31 10:10:22 7.12625 6.08242    NA   NA           NA   NA
# 11 1899-12-31 10:17:00      NA      NA  5001 22.2         2.16   15
# 12 1899-12-31 11:06:00      NA      NA 36787 20.0         2.29   15
# 13 1899-12-31 11:21:00      NA      NA 20835 22.5         2.37   15
# 14 1899-12-31 11:36:00      NA      NA 13085 22.6         2.42   15
# 15 1899-12-31 12:06:00      NA      NA  8344 23.0         2.49   15
© www.soinside.com 2019 - 2024. All rights reserved.