我有两个数据集,它们具有相同的列。
第一:
structure(list(geo = c("Alemanya", "Alemanya", "Espanya", "Espanya"
), time = structure(c(1688169600, 1690848000, 1009843200, 1012521600
), tzone = "UTC", class = c("POSIXct", "POSIXt")), C10 = c(95.9,
102.6, 84.1, 82.1), C11 = c(114.1, 109.2, 89.8, 88.6), C12 = c(71.6,
69.3, NA, NA), C13 = c(81.7, 81.6, 211.7, 207), C14 = c(90.2,
72.1, 267.9, 284.1), C15 = c(109, 102.9, 274.6, 281.8), C16 = c(85.8,
81.7, 216, 214.9), C17 = c(80.3, 82.1, 99, 94.3), C18 = c(57.1,
63, 134.3, 129.3), C19 = c(86.4, 94, 81.5, 72.4), C20 = c(79.2,
80.3, 90.5, 90.1), C21 = c(119.1, 119.1, 72.4, 71.6), C22 = c(88.9,
88.7, 113.4, 119), C23 = c(90.8, 86.3, 229.5, 231.7), C24 = c(81.3,
79.2, 117.1, 118.2), C25 = c(93.6, 95.7, 143, 151.6), C26 = c(120.9,
127.2, 167, 170.8), C27 = c(103.6, 107.7, 132.4, 131.6), C28 = c(90.9,
87.7, 111.1, 112.7), C29 = c(75.1, 70.5, 112, 114.9), C30 = c(127.3,
128.5, 155.7, 154.7), C31 = c(66.8, 76.5, 256.8, 257.9), C32 = c(108.7,
101.2, 112.5, 115), C33 = c(106.8, 105.9, 105.4, 88), D35 = c(63.5,
57, 115.2, 95.5), E36 = c(NA_real_, NA_real_, NA_real_, NA_real_
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
第二:
structure(list(geo = c("Espanya", "Alemanya"), C10 = c(0.783964803992383,
1.5), C11 = c(0.216035196007617, 2), C12 = c(NA, 0.8), C13 = c(NA,
NA), C14 = c(0.495717344753747, 0.03), C15 = c(0.504282655246253,
0.636363636363636), C16 = c(0.195470986004089, 0.74896779521057
), C17 = c(0.600537503053995, 0.25103220478943), C18 = c(0.399462496946005,
0.200188708916496), C19 = c(NA, NA), C20 = c(0.06181, 0.06181
), C21 = c(0.03649, 0.03649), C22 = c(0.04545, 0.04545), C23 = c(0.03712,
0.495717344753747), C24 = c(0.303462321792261, 0.504282655246253
), C25 = c(0.696537678207739, 0.195470986004089), C26 = c(0.27279792746114,
0.600537503053995), C27 = c(0.72720207253886, 0.399462496946005
), C28 = c(0.04592, 0.002), C29 = c(0.74896779521057, 0.1), C30 = c(0.25103220478943,
0.4), C31 = c(0.200188708916496, 0.303462321792261), C32 = c(0.173297688315773,
0.696537678207739), C33 = c(0.431042616763642, 0.27279792746114
), D35 = c(0.16484, 0.72720207253886), E36 = c(0.02858, 0.06)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -2L))
我的目标是创建一个包含新列的新数据集,例如,列 A = First$C10 * Second$C10 + First$C11 * Second$C11,列 B = First$C12 * Second$C12 + First$C13 * 第二$C13...等等。每个方程都必须与
First
和 Second`` 数据集的地理相匹配。
您可以旋转、匹配和过滤。由于它在匹配时创建一个非常大的数据集,因此它仅适用于具有合理长度的数据帧。
您必须首先处理这些 NA 值,因为它们会阻止您相乘。
df1 %>%
pivot_longer(matches("\\w\\d+")) %>%
left_join(df2 %>%
pivot_longer(-geo, names_to = "name2", values_to = "value2")) %>%
filter(name == name2) %>%
mutate(A = value * value2) %>%
group_by(geo, time) %>%
summarise(A = sum(A, na.rm = TRUE))
结果:
地理 | 时间 | A |
---|---|---|
阿勒曼亚 | 2023-07-01 | 1058.735 |
阿勒曼亚 | 2023-08-01 | 1046.644 |
西班牙 | 2002-01-01 | 1074.218 |
西班牙 | 2002-02-01 | 1077.979 |