我有两个数据框:
1,
NAME
1 SMALL H
2 ZITT M
3 SMITH E
4 GLANZEL W
5 HUANG MH
6 THIJS B
和2,
name address
SIBLEY B SOME ADDRESS 1
STEWART C;KOCH A SOME ADDRESS 2
HILL GM;LEE A;SMITH E SOME ADDRESS 3
DAVIS L SOME ADDRESS 4
MERCIER K;SMITH E;GIBBONE A SOME ADDRESS 5
DAVIDSON S;BEKIARI A SOME ADDRESS 6
我希望能够将第一个表中的NAME
匹配到第二个表中与name
匹配的字符串的实例,然后添加ADDRESS
列中的数据,有点类似于vlookup。它还必须处理同名的多个实例。在上面的示例中,名称SMITH E
(不同的人)将提供一个匹配项,结果如下:
NAME ADDRESS 1 ADDRESS 2
1 SMALL H
2 ZITT M
3 SMITH E SOME ADDRESS 5 SOME ADDRESS 3
4 GLANZEL W
5 HUANG MH
6 THIJS B
这里是tidyverse
解决方案。我首先通过将条目拆分为单独的名称来清理第二张表。我们可以使用left_join
来匹配条目:
library(tidyverse)
df2_clean <- df2 %>%
mutate(name = str_split(name, ";")) %>%
unnest(name)
df1 %>%
left_join(df2_clean, by = c("NAME" = "name"))
#> NAME address
#> 1 SMALL H <NA>
#> 2 ZITT M <NA>
#> 3 SMITH E SOME ADDRESS 3
#> 4 SMITH E SOME ADDRESS 5
#> 5 GLANZEL W <NA>
#> 6 HUANG MH <NA>
#> 7 THIJS B <NA>
如果您确实想要,您可以将Smith的两个地址分成两列,但我建议在这里使用长格式:
df1 %>%
left_join(df2_clean, by = c("NAME" = "name")) %>%
group_by(NAME) %>%
mutate(add_c = row_number()) %>%
pivot_wider(id_cols = NAME, names_from = add_c, names_prefix = "address_", values_from = address)
#> # A tibble: 6 x 3
#> # Groups: NAME [6]
#> NAME address_1 address_2
#> <chr> <chr> <chr>
#> 1 SMALL H <NA> <NA>
#> 2 ZITT M <NA> <NA>
#> 3 SMITH E SOME ADDRESS 3 SOME ADDRESS 5
#> 4 GLANZEL W <NA> <NA>
#> 5 HUANG MH <NA> <NA>
#> 6 THIJS B <NA> <NA>
df1 <- read.delim(text = "NAME
SMALL H
ZITT M
SMITH E
GLANZEL W
HUANG MH
THIJS B", stringsAsFactors = FALSE)
df2 <- read.delim(text = "name,address
SIBLEY B,SOME ADDRESS 1
STEWART C;KOCH A,SOME ADDRESS 2
HILL GM;LEE A;SMITH E,SOME ADDRESS 3
DAVIS L,SOME ADDRESS 4
MERCIER K;SMITH E;GIBBONE A,SOME ADDRESS 5
DAVIDSON S;BEKIARI A,SOME ADDRESS 6", sep = ",", stringsAsFactors = FALSE)