如何将演员-年份数据集转换为R中的国家/地区年份数据集

问题描述 投票:1回答:1

我正在研究以下演员年份数据集,其中有关国家的信息是通过变量给出的,其中每个国家之间用逗号隔开。

dt_initial <- data.frame(actor=c("Actor1","Actor1", "Actor2","Actor3"),year=c(2017,2018,2019,2020),
              country=c("Country1", "Country1", "Country1, Country2", "Country1, Country2, Country3"),
              amount=c(10,20,70,90))

> dt_initial
   actor year                      country amount
1 Actor1 2017                     Country1     10
2 Actor1 2018                     Country1     20
3 Actor2 2019           Country1, Country2     70
4 Actor3 2020 Country1, Country2, Country3     90

我想将此数据集转换为国家/地区年份数据集,在每个国家/地区中都有一行。另外,我希望将变量“金额”除以初始数据集中每一行中指示的国家/地区数量。我的最终数据集将是

dt_final <- data.frame(actor=c("Actor1", "Actor1","Actor2","Actor3", "Actor2", "Actor3", "Actor3"),year=c(2017, 2018, 2019,2020, 2019, 2020, 2020),
              country=c("Country1", "Country1", "Country1", "Country1",  "Country2", "Country2", "Country3"),
              amount=c(10, 20,35,30, 35, 30, 30))
> dt_final
   actor year  country amount
1 Actor1 2017 Country1     10
2 Actor1 2018 Country1     20
3 Actor2 2019 Country1     35
4 Actor3 2020 Country1     30
5 Actor2 2019 Country2     35
6 Actor3 2020 Country2     30
7 Actor3 2020 Country3     30

非常感谢您的帮助!

r dataframe transformation
1个回答
1
投票

我们可以使用separate_rows将数据分成不同的行,每个group_by使用actor并将amount除以​​每个组中的行数。

library(dplyr)

dt_initial %>%
  tidyr::separate_rows(country, sep = ", ") %>%
  group_by(actor) %>%
  mutate(amount = amount/n())

#  actor   year country  amount
#  <fct>  <dbl> <chr>     <dbl>
#1 Actor1  2018 Country1     20
#2 Actor2  2019 Country1     35
#3 Actor2  2019 Country2     35
#4 Actor3  2020 Country1     30
#5 Actor3  2020 Country2     30
#6 Actor3  2020 Country3     30
© www.soinside.com 2019 - 2024. All rights reserved.