R:使用来自其他数据帧的列名,条件和值在数据框中创建新列

问题描述 投票:0回答:2

将基础数据框视为:

data <-  data.frame(amount_bin = c("10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+", "10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+", "10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+"),
                   risk_score = c("0-700", "700-750", "750-800", "800-850", "850-900", "0-700", "700-750", "750-800", "800-850", "850-900", "0-700", "700-750", "750-800", "800-850", "850-900"))

并将信息分组到另一个数据框中:

group_info <- data.frame(variable = c("amount_bin_group", "amount_bin_group", "amount_bin_group", "amount_bin_group", "amount_bin_group",
                                 "risk_score_group", "risk_score_group", "risk_score_group", "risk_score_group", "risk_score_group"),
                    bin = c("10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+",
                            "0-700", "700-750", "750-800", "800-850", "850-900"),
                    group = c("1", "1", "2", "2", "3",
                              "a", "a", "a", "b", "b"))

我想在基础数据框(数据)中创建名为“amount_bin_group”和“risk_score_group”的2列,当group_info和data中的bin列相同时,它会从列group_info $ group中获取值。为简单起见,我们假设基本列将始终是group_info $变量名称减去“group”字符串。这意味着,当我们想要创建列amount_bin_group时,基本列将始终是基数据帧中的amount_bin。

预期结果数据框是:

final_data <-  data.frame(amount_bin = c("10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+", "10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+", "10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+"),
                   risk_score = c("0-700", "700-750", "750-800", "800-850", "850-900", "0-700", "700-750", "750-800", "800-850", "850-900", "0-700", "700-750", "750-800", "800-850", "850-900"),
                   amount_bin_group = c("1", "1", "2", "2", "3", "1", "1", "2", "2", "3", "1", "1", "2", "2", "3"),
                   risk_score_group = c("a", "a", "a", "b", "b", "a", "a", "a", "b", "b", "a", "a", "a", "b", "b"))

我刚才想到的解决方案是迭代合并数据帧,即:

final_data <- merge(data, group_info[, c("bin", "group")], by.x = "amount_bin", by.y = "bin")

final_data$amount_bin_group <- final_data$group
final_data$group <- NULL

但是,我相信可以有更有效的解决方案。请注意,有多个这样的列,而不仅仅是两个。所以,也许一个循环会有所帮助。

r dplyr plyr tidyr
2个回答
1
投票

您可以使用for循环来保持不同集合的合并:

for (i in unique(group_info$variable)) {
  data <- merge(
    data, group_info[group_info$variable==i,c("bin","group")],
    by.x=sub("_group","",i), by.y="bin"
  )
  names(data)[names(data)=="group"] <- i
}

1
投票

你的group_info只是过于整洁。我简直不敢相信我真的这么说。通过将其分解为两个数据帧,或将每一半数据分成自己的列,您可以自己做一个简单的左连接来获得答案。

final_data_calc <- data %>%
  left_join(
    group_info %>% 
      filter(variable == 'amount_bin_group') %>% 
      rename(amount_bin_group = group,amount_bin = bin) %>% 
      select(-variable)
  ) %>%
  left_join(
    group_info %>% 
      filter(variable == 'risk_score_group') %>% 
      rename(risk_score_group = group,risk_score = bin) %>% 
      select(-variable)
  )

#   amount_bin risk_score amount_bin_group risk_score_group
#1     10K-25K      0-700                1                a
#2     25K-35K    700-750                1                a
#3     35K-45K    750-800                2                a
#4     45K-50K    800-850                2                b
#5        50K+    850-900                3                b
#6     10K-25K      0-700                1                a
#7     25K-35K    700-750                1                a
#8     35K-45K    750-800                2                a
#9     45K-50K    800-850                2                b
#10       50K+    850-900                3                b
#11    10K-25K      0-700                1                a
#12    25K-35K    700-750                1                a
#13    35K-45K    750-800                2                a
#14    45K-50K    800-850                2                b
#15       50K+    850-900                3                b
© www.soinside.com 2019 - 2024. All rights reserved.