你会如何使用tidyverse来计算下一个出局者的数量,从过去的表现。

问题描述 投票:0回答:1

Next out Winners是指一起参加某场比赛的参赛者(RaceA),在该场比赛后,每个参赛者参加另一场比赛,RaceB(i)。RaceB(i)代表RaceA的每个参赛者接下来参加的下一场比赛。RaceB(i)可以是不同的,也可以是相同的(不要求是相同的比赛)。

Next Out Winners是RaceA(和其他比赛)的指标,反映了在某个指定时间范围内(例如90天)赢得RaceB(i)的选手数量。在指定的时间范围之后进行的比赛不包括在计算Next Out Winners的时候。

我的问题是,如何使用tidyverse来计算这个问题?我假设purrr函数将发挥突出的作用。

解决问题的逻辑:key inputs: 赛事、日期、选手和完赛位置.data: 赛事数据框,按升序排列,即日期、赛号和选手。

关键函数(s): map2_xxx或pmap函数,用于计算所有在90天内再次参加比赛并获胜(完赛=1)的马匹。

疑问:在未来90天内再次参加比赛并获胜(完赛率=1)的所有马匹的总和。: 我们的函数如何知道每匹马只回顾1场下一场比赛)。) 遍历数据框中的每一场比赛,效率会非常低。我已经创建了一个小样本数据框架,并将其包含在一个reprex中。我还在继续努力,但欢迎大家提供任何协助或建议。

Reprex:

library(tidyverse)
library(lubridate)
library(reprex)
#> Warning: package 'reprex' was built under R version 3.5.3

jdiff <- function(x, y, t="equal", i=0, j=0){
  case_when(
    t == 'equal'~ if_else(x ==y,1L,0L),
    t == 'rng' ~ if_else(x >= y-i & x <= y+j,1L,0L),
    TRUE ~ 0L
  )

}

df<-tibble(runner=c("D.Wottle","D.Wottle","D.Wottle","D.Wottle","D.Wottle","D.Wottle","C.Hottle","C.Hottle","C.Hottle","C.Hottle","C.Hottle","C.Hottle","JJ.Watt","JJ.Watt","JJ.Watt","JJ.Watt","JJ.Watt","JJ.Watt"),
           race_number=c(6,5,4,3,2,1,6,5,4,3,2,1,6,5,4,3,2,1),
           race_venue=c("xyx","xyx","xyx","xyx","xyx","xyx","xyx","xyx","xyx","xyx","xyx","xyx","xyx","xyx","xyx","xyx","xyx","xyx"),
           race_date=c(ymd('20170625'),ymd('20170524'),ymd('20170420'),ymd('20170329'),ymd('20170308'),ymd('20170215'),ymd('20170625'),ymd('20170524'),ymd('20170410'),ymd('20170329'),ymd('20170304'),ymd('20170215'),ymd('20170615'),ymd('20170524'),ymd('20170428'),ymd('20170329'),ymd('20170301'),ymd('20170225')),
           distance=c(1400L,1400L,1600L,1400L,1500L,1400L,1400L,1400L,1600L,1400L,1500L,1400L,1400L,1400L,1600L,1400L,1500L,1400L),
           finish=c(1L,2L,2L,1L,2L,3L,2L,3L,3L,2L,1L,1L,3L,1L,1L,3L,3L,2L)
           ) %>% 
  arrange(race_date, race_number, finish) %>% 
  # Calculation here to add Next out winner for each Race
  #
  mutate(next_out_winners = 0) %>% 
  group_by(race_date, race_number)

  #group by races to show the next out winner for each race
  #group_by(race_dte, race_number)


print(df,36)
#> # A tibble: 18 x 7
#> # Groups:   race_date, race_number [12]
#>    runner   race_number race_venue race_date  distance finish next_out_winners
#>    <chr>          <dbl> <chr>      <date>        <int>  <int>            <dbl>
#>  1 C.Hottle           1 xyx        2017-02-15     1400      1                0
#>  2 D.Wottle           1 xyx        2017-02-15     1400      3                0
#>  3 JJ.Watt            1 xyx        2017-02-25     1400      2                0
#>  4 JJ.Watt            2 xyx        2017-03-01     1500      3                0
#>  5 C.Hottle           2 xyx        2017-03-04     1500      1                0
#>  6 D.Wottle           2 xyx        2017-03-08     1500      2                0
#>  7 D.Wottle           3 xyx        2017-03-29     1400      1                0
#>  8 C.Hottle           3 xyx        2017-03-29     1400      2                0
#>  9 JJ.Watt            3 xyx        2017-03-29     1400      3                0
#> 10 C.Hottle           4 xyx        2017-04-10     1600      3                0
#> 11 D.Wottle           4 xyx        2017-04-20     1600      2                0
#> 12 JJ.Watt            4 xyx        2017-04-28     1600      1                0
#> 13 JJ.Watt            5 xyx        2017-05-24     1400      1                0
#> 14 D.Wottle           5 xyx        2017-05-24     1400      2                0
#> 15 C.Hottle           5 xyx        2017-05-24     1400      3                0
#> 16 JJ.Watt            6 xyx        2017-06-15     1400      3                0
#> 17 D.Wottle           6 xyx        2017-06-25     1400      1                0
#> 18 C.Hottle           6 xyx        2017-06-25     1400      2                0

创建于2020-05-14 重读包 (v0.3.0)

期望的输出(手工计算)

enter image description here

r dplyr tidyverse purrr
1个回答
1
投票

我没有用 map 功能--我使用了 lead

nextouts <- df %>% 
    # These are clutter
  select(- race_venue, -distance) %>% 
  arrange(runner, race_date) %>% 
  group_by(runner) %>% 
    # Use 'lead' to compare this race to the next one.
  mutate(next_finish = lead(finish, default= 0),
         next_date = lead(race_date, default = NA),
         is_next_out_win =
           (next_finish == 1 & next_date - race_date <= 90)) %>% 
  ungroup()

目前看来效果不错

# A tibble: 18 x 7
   runner   race_number race_date  finish next_finish next_date  is_next_out_win
   <chr>          <dbl> <date>      <int>       <dbl> <date>     <lgl>          
 1 C.Hottle           1 2017-02-15      1           1 2017-03-04 TRUE           
 2 C.Hottle           2 2017-03-04      1           2 2017-03-29 FALSE          
 3 C.Hottle           3 2017-03-29      2           3 2017-04-10 FALSE          
 4 C.Hottle           4 2017-04-10      3           3 2017-05-24 FALSE          
 5 C.Hottle           5 2017-05-24      3           2 2017-06-25 FALSE          
 6 C.Hottle           6 2017-06-25      2           0 NA         FALSE          
 7 D.Wottle           1 2017-02-15      3           2 2017-03-08 FALSE          
 8 D.Wottle           2 2017-03-08      2           1 2017-03-29 TRUE           
 9 D.Wottle           3 2017-03-29      1           2 2017-04-20 FALSE          
10 D.Wottle           4 2017-04-20      2           2 2017-05-24 FALSE          
11 D.Wottle           5 2017-05-24      2           1 2017-06-25 TRUE           
12 D.Wottle           6 2017-06-25      1           0 NA         FALSE          
13 JJ.Watt            1 2017-02-25      2           3 2017-03-01 FALSE          
14 JJ.Watt            2 2017-03-01      3           3 2017-03-29 FALSE          
15 JJ.Watt            3 2017-03-29      3           1 2017-04-28 TRUE           
16 JJ.Watt            4 2017-04-28      1           1 2017-05-24 TRUE           
17 JJ.Watt            5 2017-05-24      1           3 2017-06-15 FALSE          
18 JJ.Watt            6 2017-06-15      3           0 NA         FALSE    

看看每场比赛的结果。

nextoutsR <- nextouts %>% 
  arrange(race_number)

# A tibble: 18 x 7
   runner   race_number race_date  finish next_finish next_date  is_next_out_win
   <chr>          <dbl> <date>      <int>       <dbl> <date>     <lgl>          
 1 C.Hottle           1 2017-02-15      1           1 2017-03-04 TRUE           
 2 D.Wottle           1 2017-02-15      3           2 2017-03-08 FALSE          
 3 JJ.Watt            1 2017-02-25      2           3 2017-03-01 FALSE          
 4 C.Hottle           2 2017-03-04      1           2 2017-03-29 FALSE          
 5 D.Wottle           2 2017-03-08      2           1 2017-03-29 TRUE           
 6 JJ.Watt            2 2017-03-01      3           3 2017-03-29 FALSE          
 7 C.Hottle           3 2017-03-29      2           3 2017-04-10 FALSE          
 8 D.Wottle           3 2017-03-29      1           2 2017-04-20 FALSE          
 9 JJ.Watt            3 2017-03-29      3           1 2017-04-28 TRUE           
10 C.Hottle           4 2017-04-10      3           3 2017-05-24 FALSE          
11 D.Wottle           4 2017-04-20      2           2 2017-05-24 FALSE          
12 JJ.Watt            4 2017-04-28      1           1 2017-05-24 TRUE           
13 C.Hottle           5 2017-05-24      3           2 2017-06-25 FALSE          
14 D.Wottle           5 2017-05-24      2           1 2017-06-25 TRUE           
15 JJ.Watt            5 2017-05-24      1           3 2017-06-15 FALSE          
16 C.Hottle           6 2017-06-25      2           0 NA         FALSE          
17 D.Wottle           6 2017-06-25      1           0 NA         FALSE          
18 JJ.Watt            6 2017-06-15      3           0 NA         FALSE       

看起来每场比赛都有一个下一个赢家(除了最后一场) 这很合理,因为每场比赛都只有三匹马,而且是跑马。

我们来总结一下。 记住,你可以把 "真 "和 "假 "相加,因为它们是1,0。

nextOutWinsInRace <- nextouts %>% 
  group_by(race_number) %>% 
  summarise(nextOutWinCount = sum(is_next_out_win))

R> nextOutWinsInRace
# A tibble: 6 x 2
  race_number nextOutWinCount
        <dbl>           <int>
1           1               1
2           2               1
3           3               1
4           4               1
5           5               1
6           6               0

叫它收工。

最新问题
© www.soinside.com 2019 - 2024. All rights reserved.