我有两个数据框,名为
bld_g
和 cat_df
。
# Creating bld_g
bld_g <- tibble(
st_con_rt = c("sub-room", "", "main-room", "sub-room", "sub-room", "sub-room", "sub-room", "", "main-room", "main-room", "main-room", "main-room", "", "sub-room"),
st_con_tr = c("direct", "", "direct", "direct", "direct", "direct", "direct", "", "terrace", "terrace", "direct", "terrace", "", "terrace"),
st_th = c("", "", "hira", "", "", "", "", "", "tsuma", "tsuma", "tsuma", "tsuma", "", ""),
st_adsb = c("add", "", "sub", "sub", "add", "add", "add", "", "add", "add", "sub", "add", "", "sub"),
tr_adsb = c("sub", "", "", "sub", "sub", "", "sub", "", "sub", "sub", "add", "sub", "", "sub"),
st_sub_main_th = c("tsuma", "hira", "hira", "hira", "hira", "hira", "NA", "other", "", "", "", "", "hira", "hira"),
roo_com = c("3b+7", "2a+7", "1b+7", "1a+7", "2a+7", "1a+7", "7", "1b", "1a+7", "4a", "2a", "4a", "7", "2a+7")
)
# Creating cat_df
cat_df <- tibble(
type = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
st_con_rt = c("sub-room", "sub-room", "sub-room", "sub-room", "sub-room", "main-room", "sub-room", "sub-room", "sub-room"),
st_con_tr = c("direct", "direct", "direct", "direct", "direct", "terrace", "terrace", "terrace", "terrace"),
st_th = c("tsuma", "tsuma", "tsuma", NA, NA, "tsuma", "tsuma", "tsuma", "tsuma"),
st_adsb = c("add", "add", "add", "sub", "sub", "add", "add", "add", "add"),
tr_adsb = c("sub", "sub", "sub", "sub", "sub", "sub", "sub", "sub", "sub"),
st_sub_main_th = c("hira", "hira", "hira", "hira", "hira", NA, "hira", "hira", "hira"),
roo_com = c("1a+7", "2a+7", "4a", "1a+7", "2a+7", "4a", "1a+7", "2a+7", "4a")
)
cat_df
是参考数据,bld_g
是我需要更新的数据。他们有着相同的领域,但价值观却不同。 cat_df
包含我想要插入到 type
中的 bld_g
字段。
我需要做的是:
cat_df
中的bld_g
)无论所有字段值是否匹配,从type
中为cat_df
的每一行找到对应的行bld_g
中的所有值与 cat_df
相同,则将 type
值从 cat_df
复制到 bld_g
(附加列名称将为 type
),否则输入 0NA
中有cat_df
,则可以省略该列进行比较bld_g
中的某些字段包含空白。它只是意味着no information
。没有什么特殊意义。
这样我想确定
bld_g
的每一行属于哪种类型。
我发现了几个与我的问题类似的帖子,例如比较两个数据框 R 之间的列,但它不适合我的情况。
我认为可以通过像这样的多个
left_join
轻松解决,但它不起作用,因为bld_g
包含空。
outcome <- bld_g %>% left_join(cat_df, by=c("st_con_rt", "st_con_tr",
"st_th", "st_adsb", "tr_adsb", "st_sub_main_th", "roo_com"))
我通常使用
dplyr
系列进行数据操作,但也欢迎其他解决方案。希望您的建议。
希望这就是你所追求的东西
bld_g %>%
mutate(cat = {
match(
do.call(paste, .),
cat_df %>%
mutate(across(-type, ~ if_else(is.na(.x), "", .x))) %>%
{
do.call(paste, select(., -type))
}
)
})
给出了
cat
列,标识 cat_df
中的行号
# A tibble: 14 × 8
st_con_rt st_con_tr st_th st_adsb tr_adsb st_sub_main_th roo_com cat
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <int>
1 "sub-room" "direct" "" "add" "sub" "tsuma" 3b+7 NA
2 "" "" "" "" "" "hira" 2a+7 NA
3 "main-room" "direct" "hira" "sub" "" "hira" 1b+7 NA
4 "sub-room" "direct" "" "sub" "sub" "hira" 1a+7 4
5 "sub-room" "direct" "" "add" "sub" "hira" 2a+7 NA
6 "sub-room" "direct" "" "add" "" "hira" 1a+7 NA
7 "sub-room" "direct" "" "add" "sub" "NA" 7 NA
8 "" "" "" "" "" "other" 1b NA
9 "main-room" "terrace" "tsuma" "add" "sub" "" 1a+7 NA
10 "main-room" "terrace" "tsuma" "add" "sub" "" 4a 6
11 "main-room" "direct" "tsuma" "sub" "add" "" 2a NA
12 "main-room" "terrace" "tsuma" "add" "sub" "" 4a 6
13 "" "" "" "" "" "hira" 7 NA
14 "sub-room" "terrace" "" "sub" "sub" "hira" 2a+7 NA