将新行添加到具有分组依据并汇总的数据框中

问题描述 投票:0回答:1

我想将df按三列进行分组,并添加新行,该行将是第四列的总和。

我的数据看起来像

fc <- c("F", "F", "E", "E", "TF", "TF")
group_code <- c("Egg_x", "Egg_y", "Egg_x", "Egg_y", "Egg_x", "Egg_y")
id <- c(1, 1, 1, 1, 1, 1)
value <- c(2, 21, 4, 3, 20, 15)

df <-data.frame(cbind(fc, group_code, id, value))

> df
  fc group_code id value
1  F      Egg_x  1     2
2  F      Egg_y  1    21
3  E      Egg_x  1     4
4  E      Egg_y  1     3
5 TF      Egg_x  1    20
6 TF      Egg_y  1    15

在此示例中,我想创建一个同时包含Egg_xEgg_y的新组,我可以使用df$group <- sub('\\_.*', '', df$group_code)完成此操作,我们有

> df
  fc group_code id value main_group
1  F      Egg_x  1     2  Egg
2  F      Egg_y  1    21  Egg
3  E      Egg_x  1     4  Egg
4  E      Egg_y  1     3  Egg
5 TF      Egg_x  1    20  Egg
6 TF      Egg_y  1    15  Egg

我想为fc列的每个值添加新行,因此我将fc,id和main_group分组,并获取value列的总和。

我的末端df应该看起来像:

> df
  fc group_code id value main_group
1  F      Egg_x  1     2  Egg
2  F      Egg_y  1    21  Egg
3  F      Egg    1    23  Egg
4  E      Egg_x  1     4  Egg
5  E      Egg_y  1     3  Egg
6  E      Egg    1     7  Egg
7 TF      Egg_x  1    20  Egg
8 TF      Egg_y  1    15  Egg
9 TF      Egg    1    35  Egg 

在以上df中,每三行中前两个元素之和中的value元素。

谢谢

r dataframe dplyr plyr tidyr
1个回答
0
投票

将数据框重塑为宽格式,然后创建一个新列Egg = Egg_x + Egg_y,然后转换回长格式

library(tidyverse)

df %>% 
  spread(group_code, value) %>% 
  mutate(Egg = Egg_x + Egg_y) %>% 
  gather(key = "group_code", value, -fc, -id) %>% 
  arrange(fc)
#>   fc id group_code value
#> 1  E  1      Egg_x     4
#> 2  E  1      Egg_y     3
#> 3  E  1        Egg     7
#> 4  F  1      Egg_x     2
#> 5  F  1      Egg_y    21
#> 6  F  1        Egg    23
#> 7 TF  1      Egg_x    20
#> 8 TF  1      Egg_y    15
#> 9 TF  1        Egg    35

reprex package(v0.3.0)在2019-11-05创建


0
投票

首先,我们将创建一个带有摘要行的单独数据框-df_sum

library(dplyr)
library(forcats)

df <-
  tibble(
    fc         = c("F", "F", "E", "E", "TF", "TF"),
    group_code = c("Egg_x", "Egg_y", "Egg_x", "Egg_y", "Egg_x", "Egg_y"),
    id         = c(1, 1, 1, 1, 1, 1),
    value      = c(2, 21, 4, 3, 20, 15)
  ) %>% 
  mutate(main_group = sub('\\_.*', '', group_code))


df_sum <-
  df %>% 
  group_by(fc, main_group, id) %>% 
  summarise(value =  sum(value)) %>% 
  mutate(group_code = main_group)

df_sum
#> # A tibble: 3 x 5
#> # Groups:   fc, main_group [3]
#>   fc    main_group    id value group_code
#>   <chr> <chr>      <dbl> <dbl> <chr>     
#> 1 E     Egg            1     7 Egg       
#> 2 F     Egg            1    23 Egg       
#> 3 TF    Egg            1    35 Egg

然后将其绑定到原始df并进行排列

res <-
  bind_rows(df, df_sum) %>% 
  # fct_inorder to make sure summary rows appear after
  # original rows after sorting
  mutate(group_code = fct_inorder(group_code)) %>% 
  arrange(fc, main_group, id, group_code) %>% 
  mutate()

res
#> # A tibble: 9 x 5
#>   fc    group_code    id value main_group
#>   <chr> <fct>      <dbl> <dbl> <chr>     
#> 1 E     Egg_x          1     4 Egg       
#> 2 E     Egg_y          1     3 Egg       
#> 3 E     Egg            1     7 Egg       
#> 4 F     Egg_x          1     2 Egg       
#> 5 F     Egg_y          1    21 Egg       
#> 6 F     Egg            1    23 Egg       
#> 7 TF    Egg_x          1    20 Egg       
#> 8 TF    Egg_y          1    15 Egg       
#> 9 TF    Egg            1    35 Egg
© www.soinside.com 2019 - 2024. All rights reserved.