我有两个数据框(A 和 B),都有日期和时间列,但除此之外没有相互列。数据帧 B 包含的时间值少得多(A 和 B 只包含一天)。如何连接 A 和 B,以便 B 的所有行都“压缩”到数据帧 A 中?如果没有可用值,所有其他列应保留并显示“NA”。这是必要的,因为我想在构面包装中创建一个 ggplot。我尝试过加入、绑定行和 bayesbio 包中的函数 最近时间,但似乎没有一个适用于这种情况。
Info on dataframe A:
structure(list(time = structure(c(-2209038596.86,
-2209038594.86, -2209038592.86, -2209038590.86, -2209038588.86,
-2209038586.86, -2209038584.86, -2209038582.86, -2209038580.86,
-2209038578.86), tzone = "UTC", class = c("POSIXct", "POSIXt"
)), pH = c(7.13642, 7.12116, 7.12116, 7.13133, 7.12625, 7.12625,
7.12625, 7.13133, 7.11608, 7.12625), oxygen = c(6.13248,
6.11996, 6.10745, 6.11996, 6.11996, 6.09493, 6.09493, 6.08242,
6.08242, 6.08242)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
Info on dataframe B:
structure(list(time = structure(c(-2209038180, -2209035240, -2209034340,
-2209033440, -2209031640), tzone = "UTC", class = c("POSIXct",
"POSIXt")), DOC = c(5001, 36787, 20835, 13085, 8344), Temp = c(22.2,
20, 22.5, 22.6, 23), conductivity = c(2.16, 2.29, 2.37, 2.42,
2.49), flow = c(15, 15, 15, 15, 15)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
请注意,由于未知原因,所有时间都设置为 1899,这在 POSIXct 中意味着负数。
合并后,应该包含所有时间值(这太复杂而无法显示,因为在数据帧 B 的时间发生之前有数百行)并且还包含所有其他列。
现在,应该在那里添加 pH 值和氧气。此外,如果 x 轴显示小时和分钟而不是一天中的实际时间,那就太好了。
简单的
merge()
有什么问题吗?
dtfa <- structure(list(time = structure(c(-2209038596.86,
-2209038594.86, -2209038592.86, -2209038590.86, -2209038588.86,
-2209038586.86, -2209038584.86, -2209038582.86, -2209038580.86,
-2209038578.86), tzone = "UTC", class = c("POSIXct", "POSIXt"
)), pH = c(7.13642, 7.12116, 7.12116, 7.13133, 7.12625, 7.12625,
7.12625, 7.13133, 7.11608, 7.12625), oxygen = c(6.13248,
6.11996, 6.10745, 6.11996, 6.11996, 6.09493, 6.09493, 6.08242,
6.08242, 6.08242)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
dtfb <- structure(list(time = structure(c(-2209038180, -2209035240, -2209034340,
-2209033440, -2209031640), tzone = "UTC", class = c("POSIXct",
"POSIXt")), DOC = c(5001, 36787, 20835, 13085, 8344), Temp = c(22.2,
20, 22.5, 22.6, 23), conductivity = c(2.16, 2.29, 2.37, 2.42,
2.49), flow = c(15, 15, 15, 15, 15)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
merge(dtfa, dtfb, all=TRUE)
# time pH oxygen DOC Temp conductivity flow
# 1 1899-12-31 10:10:04 7.13642 6.13248 NA NA NA NA
# 2 1899-12-31 10:10:06 7.12116 6.11996 NA NA NA NA
# 3 1899-12-31 10:10:08 7.12116 6.10745 NA NA NA NA
# 4 1899-12-31 10:10:10 7.13133 6.11996 NA NA NA NA
# 5 1899-12-31 10:10:12 7.12625 6.11996 NA NA NA NA
# 6 1899-12-31 10:10:14 7.12625 6.09493 NA NA NA NA
# 7 1899-12-31 10:10:16 7.12625 6.09493 NA NA NA NA
# 8 1899-12-31 10:10:18 7.13133 6.08242 NA NA NA NA
# 9 1899-12-31 10:10:20 7.11608 6.08242 NA NA NA NA
# 10 1899-12-31 10:10:22 7.12625 6.08242 NA NA NA NA
# 11 1899-12-31 10:17:00 NA NA 5001 22.2 2.16 15
# 12 1899-12-31 11:06:00 NA NA 36787 20.0 2.29 15
# 13 1899-12-31 11:21:00 NA NA 20835 22.5 2.37 15
# 14 1899-12-31 11:36:00 NA NA 13085 22.6 2.42 15
# 15 1899-12-31 12:06:00 NA NA 8344 23.0 2.49 15