如何对选定列的行求和并绑定到类似格式的数据集

问题描述 投票:0回答:1

我有一个看起来像这样的数据集。

 Day|Population|Red|Yellow|Orange|Green
  1       30         15   3      4      8
  2       50         10   30     5      5
  3       10          3    6     1      0
  4       25          2   10    10      3

我想创造像这样的东西

 Day|Color            |Population
  1  Green                 8
  1  Red,Orange,Yellow    22  
  2  Green                 5
  2  Red,Orange,Yellow    45
  3  Green                 0
  3  Red,Orange,Yellow    10 
  4  Green                 3
  4  Red,Orange,Yellow    22

我有一些看起来像这样的东西,但它不起作用

df<- rbind(
           summarise(df,Day,Population=df$Green,Color="Green"),
           summarise(df,Day,Population=sum(df$Red,df$Yellow,df$Orange),
           Color="Red,Orange,Yellow")) 


 
r aggregate summarize
1个回答
0
投票

这是一种使用

data.table

的方法
library(data.table)

colors = names(df)[3:length(names(df))]
target = "Green"
non_targets = setdiff(colors, target)

setDT(df)

rbindlist(list(
  df[, .(Day, Color = target, Population = get(target))],
  df[, .(Day, Color = paste0(non_targets, collapse="|"), Population=Population-get(target))]
))[order(Day)]

输出:

   Day             Color Population
1:   1             Green          8
2:   1 Red|Yellow|Orange         22
3:   2             Green          5
4:   2 Red|Yellow|Orange         45
5:   3             Green          0
6:   3 Red|Yellow|Orange         10
7:   4             Green          3
8:   4 Red|Yellow|Orange         22

另一种方法是旋转更长的时间并从那里进行操作。该方法使用

dplyr
tidyr

进行说明

colors = names(df)[3:length(names(df))]
target = "Green"
non_targets = setdiff(colors, target)

df_long = pivot_longer(df, -c(Day:Population), names_to = "Color")

bind_rows(
    df_long %>%
        filter(Color==target) %>%
        select(Day,
               Color,
               Population=value
               ),
    df_long %>%
        group_by(Day) %>%
        summarize(Population = sum(value) - value[Color==target]) %>%
        mutate(Color = paste0(non_targets,
                              collapse="|"
                              )
               )
    ) %>%
arrange(Day)

输出:

# A tibble: 8 × 3
    Day Color             Population
  <int> <chr>                  <int>
1     1 Green                      8
2     1 Red|Yellow|Orange         22
3     2 Green                      5
4     2 Red|Yellow|Orange         45
5     3 Green                      0
6     3 Red|Yellow|Orange         10
7     4 Green                      3
8     4 Red|Yellow|Orange         22
© www.soinside.com 2019 - 2024. All rights reserved.