我正在尝试让 R (
tidyverse
) 检查国家/地区(缩写位于两列中,两列都需要更新)包含国家/地区主列表的列,并将其替换为完整的国家/地区名称。我尝试过 ifelse
声明,但得到了奇怪的结果。数据集可以在这里找到。任何建议都会非常有帮助。
您可以通过国家代码
join
这两个数据框。
library(tidyverse)
library(readxl)
df <- read_xlsx('~Data SU23 Enroll R AY22-23 2023-08-23 2 Stack Overflow.xlsx')
countrydata <- read_xlsx('~TBL Country codes.xlsx')
glimpse(df)
#> Rows: 542
#> Columns: 3
#> $ ID <chr> "F31769765", "E23531197", "Q07441087", "Y92280507", "F2688…
#> $ `MA Nation` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ `PR Nation` <chr> NA, "PK", "BG", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "M…
glimpse(countrydata)
#> Rows: 37
#> Columns: 2
#> $ CountryCode <chr> "BF", "BG", "BM", "BR", "CA", "CE", "CH", "GH", "GM", "HA"…
#> $ CountryName <chr> "BAHAMAS, THE", "BANGLADESH", "MYANMAR", "BRAZIL", "CANADA…
df %>%
# Put all columns with country codes in a long layout
pivot_longer(-ID) %>%
filter(!is.na(value)) %>%
# Join with the country code table
left_join(countrydata,
by = join_by(value == CountryCode)) %>%
# Drop the country code column
select(-value) %>%
# Return to the two country columns layout
pivot_wider(names_from = name,
values_from = CountryName) %>%
# Append the rows for IDs without country data
bind_rows(df %>% filter(is.na(`PR Nation`) & is.na(`MA Nation`)))
#> # A tibble: 542 × 3
#> ID `PR Nation` `MA Nation`
#> <chr> <chr> <chr>
#> 1 E23531197 PAKISTAN <NA>
#> 2 Q07441087 BANGLADESH <NA>
#> 3 U79148472 MEXICO <NA>
#> 4 Y43292349 PAKISTAN <NA>
#> 5 A40257720 INDIA <NA>
#> 6 Y64624318 CHINA <NA>
#> 7 B97628594 JAPAN <NA>
#> 8 T06694322 IRELAND <NA>
#> 9 J67643839 UNITED KINGDOM <NA>
#> 10 B11219391 CHINA <NA>
#> # ℹ 532 more rows
创建于 2023-08-24,使用 reprex v2.0.2