我正在尝试
cbind
/join
两个没有唯一标识符的数据框。由于它们是如何被网络抓取的,格式很复杂。
df1
包含化验结果,一天中发生的每个化验有 1 行,并且一行(ID、名称、浓度)将不同日期的化验分开。 df2
包含每个化验日期的 1 行。我需要将分析日期从 df2
绑定到 df1
.
df1 <- data.frame(X1 = c("ID", "1", "2", "3", "4", "5", "ID", "1", "2", "3", "ID", "1", "2"),
X2 = c("Name", "Jose", "Mary", "Doug", "Luisa", "Pam", "Name", "Jose", "Doug", "Lou", "Name", "Luisa", "Pam"),
X3 = c("Concentration", "4.2", "2.3", "7.3", "1.4", "0.5", "Concentration", "0.1", "2.3", "2.1", "Concentration", "9.0", "1.4"))
df2 <- data.frame(X4 = c("Monday", "Tuesday", "Friday"),
X5 = c("January", "February", "March"),
X6 = c("12", "4", "21"))
到目前为止,我已经尝试创建一个在同一天发生的标识符分析,但我没有成功,因为一天中的分析次数差异很大。实际上,我有来自几十个日期的超过 200,000 次化验。
要得到你想要的,你可以使用
which
并做:
w <- which(df1$X1 == "ID")
n <- diff(c(w, nrow(df1) + 1L))
df3 <- data.frame(df1, df2[rep.int(seq_along(n), n), ])
df3
X1 X2 X3 X4 X5 X6
1 ID Name Concentration Monday January 12
1.1 1 Jose 4.2 Monday January 12
1.2 2 Mary 2.3 Monday January 12
1.3 3 Doug 7.3 Monday January 12
1.4 4 Luisa 1.4 Monday January 12
1.5 5 Pam 0.5 Monday January 12
2 ID Name Concentration Tuesday February 4
2.1 1 Jose 0.1 Tuesday February 4
2.2 2 Doug 2.3 Tuesday February 4
2.3 3 Lou 2.1 Tuesday February 4
3 ID Name Concentration Friday March 21
3.1 1 Luisa 9.0 Friday March 21
3.2 2 Pam 1.4 Friday March 21
但是对于 R 中的分析,具有非冗余行和正确数据类型的数据框会更有用:
w <- which(df1$X1 == "ID")
n <- diff(c(w, nrow(df1) + 1L)) - 1L
df3 <- data.frame(df1[-w, ],
Date = rep.int(as.Date(paste(2023L, match(df2$X5, month.name), as.integer(df2$X6), sep = "-")), n),
row.names = NULL)
names(df3)[seq_along(df1)] <- as.character(df1[1L, ])
df3
ID Name Concentration Date
1 1 Jose 4.2 2023-01-12
2 2 Mary 2.3 2023-01-12
3 3 Doug 7.3 2023-01-12
4 4 Luisa 1.4 2023-01-12
5 5 Pam 0.5 2023-01-12
6 1 Jose 0.1 2023-02-04
7 2 Doug 2.3 2023-02-04
8 3 Lou 2.1 2023-02-04
9 1 Luisa 9.0 2023-03-21
10 2 Pam 1.4 2023-03-21