使用 dplyr 过滤分组数据框中的先前数据

问题描述 投票:0回答:1

鉴于下面的

data.frame
,我怎样才能获得每个被认为有趣的游戏的流派历史(按玩家)?

player <- c(rep (1,3), rep(2,4))
game <- c(seq(1:3), seq(1:4))
genre <- c("JnR", "Ego", "RPG", "RPG", "Sports", "JnR", "Sim")
interesting <- c("no", rep(c("no","yes"),3))
                 
playerhist <- data.frame (player, game, genre, interesting)

> playerhist

  player game  genre interesting
1      1    1    JnR          no
2      1    2    Ego          no
3      1    3    RPG         yes
4      2    1    RPG          no
5      2    2 Sports         yes
6      2    3    JnR          no
7      2    4    Sim         yes

所需输出:

  player game genre_history
1      1    1           JnR
2      1    2           Ego
3      2    1           RPG
4      2    1           RPG
5      2    2        Sports
6      2    3           JnR

因此,对于每个有趣的游戏,我想包含相应玩家的所有先前行。如所需输出所示,不需要包括“有趣”列中的信息,但如果解决方案无论如何都包括它,那也没关系。此专栏将显示为

c('no','no','no','no','yes','no')

使用

dplyr
的解决方案将是首选。

r dataframe dplyr filtering
1个回答
0
投票

我终于找到了解决办法。这是给那些试图解决类似问题的人的:

library(dplyr)

#-------------------------------
# #1 number of interesting games for each player
# #2 splitting df into list of dfs for each player
# #3 copy & append df for -> max(.$int_game) times 
# #4 numbering of dfs using a new number every time game == 1
# #5 splitting df into list of dfs for each player & df_num
# #6 count occurrences of interesting == 'yes'
# #7 set cutpoint to slice determined by cut1 == df_num for the first time
# #8 # slice from row 1:row at which cut2 == 1 and omit the last row

gamehist <- playerhist %>%
  group_by (player) %>%
  mutate (int_game = cumsum(str_detect(interesting, "yes"))) %>% #1
  ungroup () %>%
  base::split (., .$player, drop = FALSE) %>% #2 
  lapply (., function (df) { 
    df %>% 
      .[rep(1:nrow(.), max(.$int_game)),] %>% #3 
      mutate (df_num = cumsum(.$game == 1)) %>% #4 
      return ()
    }) %>% 
  bind_rows () %>%
  base::split (., list(.$player, .$df_num), drop = TRUE) %>% #5
  lapply (., function (df2) {
    df2 %>% 
      mutate (cut1 = cumsum (str_detect(interesting, "yes")), #6
              cut2 = if_else (cut1 == df_num, 1, 0)) %>% #7
      slice (1:which(cut2 == 1)-1) %>% #8
      return ()
    }) %>% 
  bind_rows () %>%
  select (player, game, genre)

gamehist

# A tibble: 6 × 3
  player  game genre 
   <dbl> <int> <chr> 
1      1     1 JnR   
2      1     2 Ego   
3      2     1 RPG   
4      2     1 RPG   
5      2     2 Sports
6      2     3 JnR 

© www.soinside.com 2019 - 2024. All rights reserved.