我有一个整洁的格式简单的数据帧:
group variable value
<fct> <chr> <dbl>
1 fishers_here 100
1 money_per_fisher 2000
1 unnecessary_variable 10
2 fishers_here 140
2 money_per_fisher 8000
2 unnecessary_variable 304
3 fishers_here 10
3 money_per_fisher 9000
....
每个组我想有变量“总钱集团”,这仅仅是fishers_here
* money_per_fisher
;基本上我想它看起来就像这样:
group variable value
<fct> <chr> <dbl>
1 fishers_here 100
1 money_per_fisher 2000
1 unnecessary_variable 10
1 TOTAL_MONEY 200000
....
有没有得到这个与tidyverse做了一个简单的方法?通过简单的我的意思,而不必进行筛选,归纳,添加变量列在后面,然后将两个单独现在dataframes。
您可以spread
,做乘法,然后gather
备份。注意:我假设有在组号一个错字第6行,因为我评论的,它应该是第2组而不是组1.如果不是这种情况,则需要一些额外的清洁步骤。您还可以将结果行,但是你想排序(例如放行的每个组一起回来)
library(tidyverse)
tbl <- read_table2(
"group variable value
1 fishers_here 100
1 money_per_fisher 2000
1 unnecessary_variable 10
2 fishers_here 140
2 money_per_fisher 8000
2 unnecessary_variable 304
3 fishers_here 10
3 money_per_fisher 9000"
)
tbl %>%
spread(variable, value) %>%
mutate(total_money_in_group = money_per_fisher * fishers_here) %>%
gather(variable, value, -group)
#> # A tibble: 12 x 3
#> group variable value
#> <dbl> <chr> <dbl>
#> 1 1 fishers_here 100
#> 2 2 fishers_here 140
#> 3 3 fishers_here 10
#> 4 1 money_per_fisher 2000
#> 5 2 money_per_fisher 8000
#> 6 3 money_per_fisher 9000
#> 7 1 unnecessary_variable 10
#> 8 2 unnecessary_variable 304
#> 9 3 unnecessary_variable NA
#> 10 1 total_money_in_group 200000
#> 11 2 total_money_in_group 1120000
#> 12 3 total_money_in_group 90000
由reprex package创建于2019年2月4日(v0.2.1)
一种选择是filter
的“money_per_fisher”,“fishers_here”,通过“组”分组,summarise
得到的“价值”的prod
,与原始数据和arrange
通过“组”绑定行
library(tidyverse)
df1 %>%
filter(variable %in% c('fishers_here', 'money_per_fisher')) %>%
group_by(group) %>%
summarise(variable = "total_money_in_group", value = prod(value)) %>%
bind_rows(tbl, .) %>%
arrange(group)
# A tibble: 11 x 3
# group variable value
# <int> <chr> <dbl>
# 1 1 fishers_here 100
# 2 1 money_per_fisher 2000
# 3 1 unnecessary_variable 10
# 4 1 total_money_in_group 200000
# 5 2 fishers_here 140
# 6 2 money_per_fisher 8000
# 7 2 unnecessary_variable 304
# 8 2 total_money_in_group 1120000
# 9 3 fishers_here 10
#10 3 money_per_fisher 9000
#11 3 total_money_in_group 90000
df1 <- structure(list(group = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L),
variable = c("fishers_here",
"money_per_fisher", "unnecessary_variable", "fishers_here", "money_per_fisher",
"unnecessary_variable", "fishers_here", "money_per_fisher"),
value = c(100L, 2000L, 10L, 140L, 8000L, 304L, 10L, 9000L
)), class = "data.frame", row.names = c(NA, -8L))
根据你的输出,我认为这是一个可能的解决方案:
df %>%
group_by(group) %>%
summarise(value = prod(value))
编辑:如果你想在原始数据集一栏,您可以改用mutate
的summarise