我有一个这样的数据框:
df <- data.frame("region" = c("Spain", "Barcelona", "Madrid",
"France", "Paris", "Lyon",
"Belgium", "Bruges", "Brussels"),
"2010" = 1:9, "2011" = c(NA, 1, 2, NA, 3, 4, NA, 5, 6))
我想将国家名称和城市名称连接在一起。国家名称的所有行均具有NA,并且每个城市名称都在国家名称之后。
我想要的数据框是这样的:
desired_df <- data.frame("region" = c("Spain_Spain", "Spain_Barcelona", "Spain_Madrid",
"France_France", "France_Paris", "France_Lyon",
"Belgium_Belgium", "Belgium_Bruges", "Belgium_Brussels"),
"2010" = 1:9, "2011" = c(NA, 1, 2, NA, 3, 4, NA, 5, 6))
如果country_country行丢失,可以的。任何帮助将不胜感激。
使用tidyverse
的通用解决方案将需要从其他数据中滤除国家/地区并将数据重新加入:
df %>%
mutate(gr = cumsum(is.na(X2011))) %>%
filter(!is.na(X2011)) %>%
left_join(countries %>%
select(region, gr) %>%
rename("country" = "region"), by = "gr") %>%
mutate(new_region = paste(country,region, sep = "_")) %>%
select(-gr)
我们可以根据国家/地区名称的出现来创建分组变量,然后将[region]的paste
元素与[region]的其他元素first
一起创建,以更新'region'列]]
library(dplyr) library(stringr) df %>% group_by(grp = cumsum(region %in% c("Spain", "France", "Belgium"))) %>% mutate(region = str_c(first(region), region, sep="_")) %>% ungroup %>% select(-grp) # A tibble: 9 x 3 # region X2010 X2011 # <chr> <int> <dbl> #1 Spain_Spain 1 NA #2 Spain_Barcelona 2 1 #3 Spain_Madrid 3 2 #4 France_France 4 NA #5 France_Paris 5 3 #6 France_Lyon 6 4 #7 Belgium_Belgium 7 NA #8 Belgium_Bruges 8 5 #9 Belgium_Brussels 9 6
或如@ akash87所述,如果该模式应基于'X2011'
df %>%
group_by(grp = cumsum(is.na(X2011))) %>%
mutate(region = str_c(first(region), region, sep="_")) %>%
ungroup %>%
select(-grp)