想象一下,我有一个从某人提供给我的大型csv加载的数据帧,其中包含要应用于其他数据集的数据的映射/重新编码。这是csv中可能出现的一个小的可重现示例:
library(wakefield)
csv_mapping <- data.frame(
from = as.character(name(30)),
to = as.character(likert_7(30))
)
以独立于CSV数据源的方式从此数据帧创建映射函数的最快方法是什么?我通常会通过运行:
dput(csv_mapping$from)
dput(csv_mapping$to)
在我的控制台中,然后将向量复制并粘贴到函数中,并使用plyr :: mapvalues()如下:
mapping_fn <- function(x) {
fromvec <- c("Kameira", "Sanavi", "Avangelene", "Maryonna", "Wyvonna", "Enam",
"Yain", "Tyonna", "Shekira", "Eleanna", "Azriela", "Saajida",
"Chantee", "Julieanne", "Genisha", "Delesha", "Macenzi", "Alyasia",
"Latonga", "Josuhe", "Arter", "Stone", "Ramaj", "Lilinoe", "Zacharie",
"Joshuamichael", "Desseray", "Colorado", "Jaidn", "Verline")
tovec <- c("Agree", "Somewhat Disagree", "Agree", "Agree", "Neutral",
"Somewhat Disagree", "Neutral", "Strongly Agree", "Somewhat Disagree",
"Disagree", "Strongly Disagree", "Disagree", "Somewhat Agree",
"Strongly Disagree", "Strongly Disagree", "Somewhat Agree", "Strongly Agree",
"Somewhat Agree", "Disagree", "Disagree", "Strongly Agree", "Strongly Disagree",
"Disagree", "Somewhat Agree", "Strongly Disagree", "Strongly Disagree",
"Neutral", "Somewhat Agree", "Agree", "Disagree")
plyr::mapvalues(x, from = fromvec, to = tovec, warn_missing = F)
}
考虑到plyr现在已退休,有没有更聪明或更快速的方法而不使用mapvalues来做到这一点?
一种自然的方法是使用join
。如果您的数据已经存在于数据框中,则此功能特别有用,但是如果您只希望映射值的向量,则可以对它进行按摩。
说我们有一个由csv定义的映射,如下所示:
csv_mapping <- data.frame(from = c("Kameira", "Sanavi", "Avangelene",
"Maryonna", "Wyvonna"),
to = c("Agree", "Somewhat Disagree", "Agree",
"Agree", "Neutral"))
csv_mapping
#> from to
#> 1 Kameira Agree
#> 2 Sanavi Somewhat Disagree
#> 3 Avangelene Agree
#> 4 Maryonna Agree
#> 5 Wyvonna Neutral
然后说我们有一个数据框df
,其中列x
给出了我们想要映射到新值的值。请注意,df
也可以包含其他列,在这种情况下,我们将添加一些随机值以进行反演示。
df <- data.frame(x = c("Sanavi", "Maryonna", "Maryonna", "Wyvonna",
"Kameira","Avangelene", "Sanavi", "Wyvonna"),
vals = rnorm(8))
df
#> x vals
#> 1 Sanavi -0.95005745
#> 2 Maryonna -0.20650715
#> 3 Maryonna -0.07755789
#> 4 Wyvonna 1.72379970
#> 5 Kameira -1.36642679
#> 6 Avangelene -1.48638577
#> 7 Sanavi 0.16987157
#> 8 Wyvonna -0.55194346
然后,我们可以使用dplyr的left_join
将映射的值引入数据帧。 (您可以阅读更多here)。
dplyr::left_join(df, csv_mapping, by = c("x" = "from"))
#> x vals to
#> 1 Sanavi -0.95005745 Somewhat Disagree
#> 2 Maryonna -0.20650715 Agree
#> 3 Maryonna -0.07755789 Agree
#> 4 Wyvonna 1.72379970 Neutral
#> 5 Kameira -1.36642679 Agree
#> 6 Avangelene -1.48638577 Agree
#> 7 Sanavi 0.16987157 Somewhat Disagree
#> 8 Wyvonna -0.55194346 Neutral
至此,您已从给定映射中获得每个x
值的对应to
值。如果只需要这些to
值,则只需从数据框中提取to
列即可。
由reprex package(v0.3.0)在2020-06-03创建