将两个不同数据集中的列乘以匹配值

问题描述 投票:0回答:1

我有两个数据集,它们具有相同的列。

第一:

structure(list(geo = c("Alemanya", "Alemanya", "Espanya", "Espanya"
), time = structure(c(1688169600, 1690848000, 1009843200, 1012521600
), tzone = "UTC", class = c("POSIXct", "POSIXt")), C10 = c(95.9, 
102.6, 84.1, 82.1), C11 = c(114.1, 109.2, 89.8, 88.6), C12 = c(71.6, 
69.3, NA, NA), C13 = c(81.7, 81.6, 211.7, 207), C14 = c(90.2, 
72.1, 267.9, 284.1), C15 = c(109, 102.9, 274.6, 281.8), C16 = c(85.8, 
81.7, 216, 214.9), C17 = c(80.3, 82.1, 99, 94.3), C18 = c(57.1, 
63, 134.3, 129.3), C19 = c(86.4, 94, 81.5, 72.4), C20 = c(79.2, 
80.3, 90.5, 90.1), C21 = c(119.1, 119.1, 72.4, 71.6), C22 = c(88.9, 
88.7, 113.4, 119), C23 = c(90.8, 86.3, 229.5, 231.7), C24 = c(81.3, 
79.2, 117.1, 118.2), C25 = c(93.6, 95.7, 143, 151.6), C26 = c(120.9, 
127.2, 167, 170.8), C27 = c(103.6, 107.7, 132.4, 131.6), C28 = c(90.9, 
87.7, 111.1, 112.7), C29 = c(75.1, 70.5, 112, 114.9), C30 = c(127.3, 
128.5, 155.7, 154.7), C31 = c(66.8, 76.5, 256.8, 257.9), C32 = c(108.7, 
101.2, 112.5, 115), C33 = c(106.8, 105.9, 105.4, 88), D35 = c(63.5, 
57, 115.2, 95.5), E36 = c(NA_real_, NA_real_, NA_real_, NA_real_
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))

第二:

structure(list(geo = c("Espanya", "Alemanya"), C10 = c(0.783964803992383, 
1.5), C11 = c(0.216035196007617, 2), C12 = c(NA, 0.8), C13 = c(NA, 
NA), C14 = c(0.495717344753747, 0.03), C15 = c(0.504282655246253, 
0.636363636363636), C16 = c(0.195470986004089, 0.74896779521057
), C17 = c(0.600537503053995, 0.25103220478943), C18 = c(0.399462496946005, 
0.200188708916496), C19 = c(NA, NA), C20 = c(0.06181, 0.06181
), C21 = c(0.03649, 0.03649), C22 = c(0.04545, 0.04545), C23 = c(0.03712, 
0.495717344753747), C24 = c(0.303462321792261, 0.504282655246253
), C25 = c(0.696537678207739, 0.195470986004089), C26 = c(0.27279792746114, 
0.600537503053995), C27 = c(0.72720207253886, 0.399462496946005
), C28 = c(0.04592, 0.002), C29 = c(0.74896779521057, 0.1), C30 = c(0.25103220478943, 
0.4), C31 = c(0.200188708916496, 0.303462321792261), C32 = c(0.173297688315773, 
0.696537678207739), C33 = c(0.431042616763642, 0.27279792746114
), D35 = c(0.16484, 0.72720207253886), E36 = c(0.02858, 0.06)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -2L))

我的目标是创建一个包含新列的新数据集,例如,列 A = First$C10 * Second$C10 + First$C11 * Second$C11,列 B = First$C12 * Second$C12 + First$C13 * 第二$C13...等等。每个方程都必须与

First
和 Second`` 数据集的地理相匹配。

r match
1个回答
0
投票

您可以旋转、匹配和过滤。由于它在匹配时创建一个非常大的数据集,因此它仅适用于具有合理长度的数据帧。

您必须首先处理这些 NA 值,因为它们会阻止您相乘。

df1 %>% 
  pivot_longer(matches("\\w\\d+")) %>% 
  left_join(df2 %>% 
              pivot_longer(-geo, names_to = "name2", values_to = "value2")) %>% 
  filter(name == name2) %>% 
  mutate(A = value * value2) %>% 
  group_by(geo, time) %>% 
  summarise(A = sum(A, na.rm = TRUE))

结果:

地理 时间 A
阿勒曼亚 2023-07-01 1058.735
阿勒曼亚 2023-08-01 1046.644
西班牙 2002-01-01 1074.218
西班牙 2002-02-01 1077.979
© www.soinside.com 2019 - 2024. All rights reserved.