在列中连接特定的字符串

问题描述 投票:0回答:2

我有一个这样的数据框:

df <- data.frame("region" = c("Spain", "Barcelona", "Madrid",
                          "France", "Paris", "Lyon", 
                          "Belgium", "Bruges", "Brussels"), 
             "2010" = 1:9, "2011" = c(NA, 1, 2, NA, 3, 4, NA, 5, 6))

我想将国家名称和城市名称连接在一起。国家名称的所有行均具有NA,并且每个城市名称都在国家名称之后。

我想要的数据框是这样的:

desired_df <- data.frame("region" = c("Spain_Spain", "Spain_Barcelona", "Spain_Madrid",
                          "France_France", "France_Paris", "France_Lyon",
                          "Belgium_Belgium", "Belgium_Bruges", "Belgium_Brussels"), 
             "2010" = 1:9, "2011" = c(NA, 1, 2, NA, 3, 4, NA, 5, 6))

如果country_country行丢失,可以的。任何帮助将不胜感激。

r paste
2个回答
0
投票

使用tidyverse的通用解决方案将需要从其他数据中滤除国家/地区并将数据重新加入:

df %>% 
mutate(gr = cumsum(is.na(X2011))) %>% 
filter(!is.na(X2011)) %>% 
left_join(countries %>% 
          select(region, gr) %>% 
          rename("country" = "region"), by = "gr") %>% 
mutate(new_region = paste(country,region, sep = "_")) %>% 
select(-gr)

0
投票

我们可以根据国家/地区名称的出现来创建分组变量,然后将[region]的paste元素与[region]的其他元素first一起创建,以更新'region'列]]

library(dplyr)
library(stringr)
df %>%
   group_by(grp = cumsum(region %in% c("Spain", "France", "Belgium"))) %>%
   mutate(region = str_c(first(region), region, sep="_")) %>%
   ungroup %>% 
   select(-grp)
# A tibble: 9 x 3
#  region           X2010 X2011
#  <chr>            <int> <dbl>
#1 Spain_Spain          1    NA
#2 Spain_Barcelona      2     1
#3 Spain_Madrid         3     2
#4 France_France        4    NA
#5 France_Paris         5     3
#6 France_Lyon          6     4
#7 Belgium_Belgium      7    NA
#8 Belgium_Bruges       8     5
#9 Belgium_Brussels     9     6

或如@ akash87所述,如果该模式应基于'X2011'

df %>%
   group_by(grp = cumsum(is.na(X2011))) %>%
   mutate(region = str_c(first(region), region, sep="_")) %>%
   ungroup %>% 
   select(-grp)
© www.soinside.com 2019 - 2024. All rights reserved.