R:比较两个数据框中的列,如果所有条件都匹配,则添加特定列的值

问题描述 投票:0回答:1

我有两个数据框,名为

bld_g
cat_df

# Creating bld_g
bld_g <- tibble(
  st_con_rt = c("sub-room", "", "main-room", "sub-room", "sub-room", "sub-room", "sub-room", "", "main-room", "main-room", "main-room", "main-room", "", "sub-room"),
  st_con_tr = c("direct", "", "direct", "direct", "direct", "direct", "direct", "", "terrace", "terrace", "direct", "terrace", "", "terrace"),
  st_th = c("", "", "hira", "", "", "", "", "", "tsuma", "tsuma", "tsuma", "tsuma", "", ""),
  st_adsb = c("add", "", "sub", "sub", "add", "add", "add", "", "add", "add", "sub", "add", "", "sub"),
  tr_adsb = c("sub", "", "", "sub", "sub", "", "sub", "", "sub", "sub", "add", "sub", "", "sub"),
  st_sub_main_th = c("tsuma", "hira", "hira", "hira", "hira", "hira", "NA", "other", "", "", "", "", "hira", "hira"),
  roo_com = c("3b+7", "2a+7", "1b+7", "1a+7", "2a+7", "1a+7", "7", "1b", "1a+7", "4a", "2a", "4a", "7", "2a+7")
)

# Creating cat_df
cat_df <- tibble(
  type = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
  st_con_rt = c("sub-room", "sub-room", "sub-room", "sub-room", "sub-room", "main-room", "sub-room", "sub-room", "sub-room"),
  st_con_tr = c("direct", "direct", "direct", "direct", "direct", "terrace", "terrace", "terrace", "terrace"),
  st_th = c("tsuma", "tsuma", "tsuma", NA, NA, "tsuma", "tsuma", "tsuma", "tsuma"),
  st_adsb = c("add", "add", "add", "sub", "sub", "add", "add", "add", "add"),
  tr_adsb = c("sub", "sub", "sub", "sub", "sub", "sub", "sub", "sub", "sub"),
  st_sub_main_th = c("hira", "hira", "hira", "hira", "hira", NA, "hira", "hira", "hira"),
  roo_com = c("1a+7", "2a+7", "4a", "1a+7", "2a+7", "4a", "1a+7", "2a+7", "4a")
)

cat_df
是参考数据,
bld_g
是我需要更新的数据。他们有着相同的领域,但价值观却不同。
cat_df
包含我想要插入到
type
中的
bld_g
字段。

我需要做的是:

  • 通过比较每个字段中的所有值(不包括
    cat_df
    中的
    bld_g
    )无论所有字段值是否匹配,从
    type
    中为
    cat_df
    的每一行找到对应的行
  • 如果
    bld_g
    中的所有值与
    cat_df
    相同,则将
    type
    值从
    cat_df
    复制到
    bld_g
    (附加列名称将为
    type
    ),否则输入 0
  • 如果
    NA
    中有
    cat_df
    ,则可以省略该列进行比较

bld_g
中的某些字段包含空白。它只是意味着
no information
。没有什么特殊意义。

这样我想确定

bld_g
的每一行属于哪种类型。

我发现了几个与我的问题类似的帖子,例如比较两个数据框 R 之间的列,但它不适合我的情况。

我认为可以通过像这样的多个

left_join
轻松解决,但它不起作用,因为
bld_g
包含空。

outcome <- bld_g %>% left_join(cat_df, by=c("st_con_rt", "st_con_tr",
  "st_th", "st_adsb", "tr_adsb", "st_sub_main_th", "roo_com")) 

我通常使用

dplyr
系列进行数据操作,但也欢迎其他解决方案。希望您的建议。

r dplyr left-join
1个回答
0
投票

希望这就是你所追求的东西

bld_g %>%
  mutate(cat = {
    match(
      do.call(paste, .),
      cat_df %>%
        mutate(across(-type, ~ if_else(is.na(.x), "", .x))) %>%
        {
          do.call(paste, select(., -type))
        }
    )
  })

给出了

cat
列,标识
cat_df

中的行号
# A tibble: 14 × 8
   st_con_rt   st_con_tr st_th   st_adsb tr_adsb st_sub_main_th roo_com   cat
   <chr>       <chr>     <chr>   <chr>   <chr>   <chr>          <chr>   <int>
 1 "sub-room"  "direct"  ""      "add"   "sub"   "tsuma"        3b+7       NA
 2 ""          ""        ""      ""      ""      "hira"         2a+7       NA
 3 "main-room" "direct"  "hira"  "sub"   ""      "hira"         1b+7       NA
 4 "sub-room"  "direct"  ""      "sub"   "sub"   "hira"         1a+7        4
 5 "sub-room"  "direct"  ""      "add"   "sub"   "hira"         2a+7       NA
 6 "sub-room"  "direct"  ""      "add"   ""      "hira"         1a+7       NA
 7 "sub-room"  "direct"  ""      "add"   "sub"   "NA"           7          NA
 8 ""          ""        ""      ""      ""      "other"        1b         NA
 9 "main-room" "terrace" "tsuma" "add"   "sub"   ""             1a+7       NA
10 "main-room" "terrace" "tsuma" "add"   "sub"   ""             4a          6
11 "main-room" "direct"  "tsuma" "sub"   "add"   ""             2a         NA
12 "main-room" "terrace" "tsuma" "add"   "sub"   ""             4a          6
13 ""          ""        ""      ""      ""      "hira"         7          NA
14 "sub-room"  "terrace" ""      "sub"   "sub"   "hira"         2a+7       NA
© www.soinside.com 2019 - 2024. All rights reserved.