我正在研究以下演员年份数据集,其中有关国家的信息是通过变量给出的,其中每个国家之间用逗号隔开。
dt_initial <- data.frame(actor=c("Actor1","Actor1", "Actor2","Actor3"),year=c(2017,2018,2019,2020),
country=c("Country1", "Country1", "Country1, Country2", "Country1, Country2, Country3"),
amount=c(10,20,70,90))
> dt_initial
actor year country amount
1 Actor1 2017 Country1 10
2 Actor1 2018 Country1 20
3 Actor2 2019 Country1, Country2 70
4 Actor3 2020 Country1, Country2, Country3 90
我想将此数据集转换为国家/地区年份数据集,在每个国家/地区中都有一行。另外,我希望将变量“金额”除以初始数据集中每一行中指示的国家/地区数量。我的最终数据集将是
dt_final <- data.frame(actor=c("Actor1", "Actor1","Actor2","Actor3", "Actor2", "Actor3", "Actor3"),year=c(2017, 2018, 2019,2020, 2019, 2020, 2020),
country=c("Country1", "Country1", "Country1", "Country1", "Country2", "Country2", "Country3"),
amount=c(10, 20,35,30, 35, 30, 30))
> dt_final
actor year country amount
1 Actor1 2017 Country1 10
2 Actor1 2018 Country1 20
3 Actor2 2019 Country1 35
4 Actor3 2020 Country1 30
5 Actor2 2019 Country2 35
6 Actor3 2020 Country2 30
7 Actor3 2020 Country3 30
非常感谢您的帮助!
我们可以使用separate_rows
将数据分成不同的行,每个group_by
使用actor
并将amount
除以每个组中的行数。
library(dplyr)
dt_initial %>%
tidyr::separate_rows(country, sep = ", ") %>%
group_by(actor) %>%
mutate(amount = amount/n())
# actor year country amount
# <fct> <dbl> <chr> <dbl>
#1 Actor1 2018 Country1 20
#2 Actor2 2019 Country1 35
#3 Actor2 2019 Country2 35
#4 Actor3 2020 Country1 30
#5 Actor3 2020 Country2 30
#6 Actor3 2020 Country3 30