基于观察数量的滚动总和,以及 R 中每个滚动的开始和结束日期

问题描述 投票:0回答:1

我希望使用 data.table frollsum() 捕获球队过去 5 场比赛获胜的运行总和。这很简单,但我还想在各自的列中捕获每轮滚动 5 场游戏的开始日期和结束日期,以便我可以识别每次滚动的日期范围。

这里有一个 dput,仅供快速使用的示例。总结中的“第一个”和“最后一个”列不知道使用滚动并使用所有数据,因此它无法捕获我所希望的内容。

library(tidyverse)
library(data.table)

df <- structure(list(date = structure(c(19655, 19655, 19657, 19657, 
19658, 19658, 19660, 19660, 19662, 19662, 19663, 19664, 19665, 
19666, 19667, 19667), class = "Date"), team = c("Detroit Pistons", 
"Chicago Bulls", "Detroit Pistons", "Chicago Bulls", "Detroit Pistons", 
"Chicago Bulls", "Chicago Bulls", "Detroit Pistons", "Detroit Pistons", 
"Chicago Bulls", "Detroit Pistons", "Chicago Bulls", "Chicago Bulls", 
"Detroit Pistons", "Detroit Pistons", "Chicago Bulls"), result = c("lose", 
"lose", "win", "win", "win", "lose", "win", "lose", "lose", "lose", 
"lose", "lose", "lose", "lose", "lose", "win")), .internal.selfref = <pointer: 
0x0000015d477c5930>, row.names = c(NA, 
16L), class = "data.frame")

df |>
  group_by(team) |> 
  summarise(roll_wins_fifty = frollsum(result == "win", n = 5),
  first = first(date),
  last = last(date))
r data.table rolling-computation
1个回答
0
投票

我不确定这是否是您要找的:我们可以仅使用实际的

first
及其滞后的
last
值来打印每个折叠结果的
date
n = 4
观察日期。

library(tidyverse)
library(data.table)

df |>
  arrange(team) |> # for better printing
  group_by(team) |> 
  mutate(roll_wins_fifty = frollsum(result == "win", n = 5),
         last = date,
         first = lag(date, n = 4)
         )
#> # A tibble: 16 × 6
#> # Groups:   team [2]
#>    date       team            result roll_wins_fifty last       first     
#>    <date>     <chr>           <chr>            <dbl> <date>     <date>    
#>  1 2023-10-25 Chicago Bulls   lose                NA 2023-10-25 NA        
#>  2 2023-10-27 Chicago Bulls   win                 NA 2023-10-27 NA        
#>  3 2023-10-28 Chicago Bulls   lose                NA 2023-10-28 NA        
#>  4 2023-10-30 Chicago Bulls   win                 NA 2023-10-30 NA        
#>  5 2023-11-01 Chicago Bulls   lose                 2 2023-11-01 2023-10-25
#>  6 2023-11-03 Chicago Bulls   lose                 2 2023-11-03 2023-10-27
#>  7 2023-11-04 Chicago Bulls   lose                 1 2023-11-04 2023-10-28
#>  8 2023-11-06 Chicago Bulls   win                  2 2023-11-06 2023-10-30
#>  9 2023-10-25 Detroit Pistons lose                NA 2023-10-25 NA        
#> 10 2023-10-27 Detroit Pistons win                 NA 2023-10-27 NA        
#> 11 2023-10-28 Detroit Pistons win                 NA 2023-10-28 NA        
#> 12 2023-10-30 Detroit Pistons lose                NA 2023-10-30 NA        
#> 13 2023-11-01 Detroit Pistons lose                 2 2023-11-01 2023-10-25
#> 14 2023-11-02 Detroit Pistons lose                 2 2023-11-02 2023-10-27
#> 15 2023-11-05 Detroit Pistons lose                 1 2023-11-05 2023-10-28
#> 16 2023-11-06 Detroit Pistons lose                 0 2023-11-06 2023-10-30

来自OP的数据

df <- structure(list(date = structure(c(19655, 19655, 19657, 19657, 
                                        19658, 19658, 19660, 19660, 19662, 19662, 19663, 19664, 19665, 
                                        19666, 19667, 19667), class = "Date"),
                     team = c("Detroit Pistons", "Chicago Bulls", "Detroit Pistons", "Chicago Bulls", "Detroit Pistons",
                              "Chicago Bulls", "Chicago Bulls", "Detroit Pistons", "Detroit Pistons", 
                              "Chicago Bulls", "Detroit Pistons", "Chicago Bulls", "Chicago Bulls", 
                              "Detroit Pistons", "Detroit Pistons", "Chicago Bulls"),
                     result = c("lose", "lose", "win", "win", "win", "lose", "win", "lose", "lose", "lose", "lose", "lose", "lose", "lose", "lose", "win")),
                row.names = c(NA, 16L), class = "data.frame")

创建于 2023-12-17,使用 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.