在 r 中按日期聚合列数据

问题描述 投票:0回答:1

从给定的数据框中,我想按

date
列聚合数据。

date <- c("2020-01-10", "2020-01-10", "2020-01-10", "2020-01-10", 
"2020-01-10", "2020-01-11","2020-01-11", "2020-01-11", "2020-01-11","2020-01-11",
 "2020-01-12", "2020-01-12", "2020-01-12", "2020-01-12", "2020-01-12", 
"2020-01-13","2020-01-13", "2020-01-13", "2020-01-13","2020-01-13")
ID <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
assets <- c(1, 2, 3, 10, 9, 21, 23,1, 1, 3, 11,11, 6,12,13,5,2 ,15,12,12)
category <- c("new", "new", "new","new","new", 
"new", "new", "old", "old", "old", "old", "old", "old", "expired","expired","expired","expired"
,"expired","expired","expired")

df1 <- data.frame(ID, date, assets, category)

此代码转换数据框,还添加新列

total
,它是
new + old + expired

的总和
df_transformed <- df1 %>%
  #group_by(date)>
  pivot_wider(names_from = category, values_from = assets)%>%
  mutate(total = c(new + old + expired))
print(df_transformed)
      ID date         new   old expired total
   <dbl> <chr>      <dbl> <dbl>   <dbl> <dbl>
 1     1 2020-01-10     1    NA      NA    NA
 2     2 2020-01-10     2    NA      NA    NA
 3     3 2020-01-10     3    NA      NA    NA
 4     4 2020-01-10    10    NA      NA    NA
 5     5 2020-01-10     9    NA      NA    NA
 6     6 2020-01-11    21    NA      NA    NA
 7     7 2020-01-11    23    NA      NA    NA
 8     8 2020-01-11    NA     1      NA    NA
 9     9 2020-01-11    NA     1      NA    NA
10    10 2020-01-11    NA     3      NA    NA
11    11 2020-01-12    NA    11      NA    NA
12    12 2020-01-12    NA    11      NA    NA
13    13 2020-01-12    NA     6      NA    NA
14    14 2020-01-12    NA    NA      12    NA
15    15 2020-01-12    NA    NA      13    NA
16    16 2020-01-13    NA    NA       5    NA
17    17 2020-01-13    NA    NA       2    NA
18    18 2020-01-13    NA    NA      15    NA
19    19 2020-01-13    NA    NA      12    NA
20    20 2020-01-13    NA    NA      12    NA

我希望实现以下成果。原始数据帧有很多 NA,所以我希望这些日期的总和为零。

      ID date           new           old              expired         total

 1     1 2020-01-10   sum of (01-10)  sum of (01-10)  sum of (01-10)   new + old + expired 
 2     2 2020-01-11   sum of (01-11)  sum of (01-11)  sum of (01-11)   new + old + expired
 3     3 2020-01-12   sum of (01-12)  sum of (01-12)  sum of (01-12)   new + old + expired
 4     4 2020-01-13   sum of (01-13)  sum of (01-13)  sum of (01-13)   new + old + expired

r dataframe dplyr aggregate data-wrangling
1个回答
0
投票

您可能需要

summarise
跨越所有内容来处理
NAs
函数中的
sum()

df_transformed |>
    group_by(date) |>
    summarise(across(everything(), .f = sum, na.rm = TRUE)) |>
    mutate(total = new + old + expired)

输出:

A tibble: 4 x 6
date    ID  new old expired total
<chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
2020-01-10  15  25  0   0   25
2020-01-11  40  44  5   0   49
2020-01-12  65  0   28  25  53
2020-01-13  90  0   0   46  46
© www.soinside.com 2019 - 2024. All rights reserved.