鉴于下面的
data.frame
,我怎样才能获得每个被认为有趣的游戏的流派历史(按玩家)?
player <- c(rep (1,3), rep(2,4))
game <- c(seq(1:3), seq(1:4))
genre <- c("JnR", "Ego", "RPG", "RPG", "Sports", "JnR", "Sim")
interesting <- c("no", rep(c("no","yes"),3))
playerhist <- data.frame (player, game, genre, interesting)
> playerhist
player game genre interesting
1 1 1 JnR no
2 1 2 Ego no
3 1 3 RPG yes
4 2 1 RPG no
5 2 2 Sports yes
6 2 3 JnR no
7 2 4 Sim yes
所需输出:
player game genre_history
1 1 1 JnR
2 1 2 Ego
3 2 1 RPG
4 2 1 RPG
5 2 2 Sports
6 2 3 JnR
因此,对于每个有趣的游戏,我想包含相应玩家的所有先前行。如所需输出所示,不需要包括“有趣”列中的信息,但如果解决方案无论如何都包括它,那也没关系。此专栏将显示为
c('no','no','no','no','yes','no')
。
使用
dplyr
的解决方案将是首选。
我终于找到了解决办法。这是给那些试图解决类似问题的人的:
library(dplyr)
#-------------------------------
# #1 number of interesting games for each player
# #2 splitting df into list of dfs for each player
# #3 copy & append df for -> max(.$int_game) times
# #4 numbering of dfs using a new number every time game == 1
# #5 splitting df into list of dfs for each player & df_num
# #6 count occurrences of interesting == 'yes'
# #7 set cutpoint to slice determined by cut1 == df_num for the first time
# #8 # slice from row 1:row at which cut2 == 1 and omit the last row
gamehist <- playerhist %>%
group_by (player) %>%
mutate (int_game = cumsum(str_detect(interesting, "yes"))) %>% #1
ungroup () %>%
base::split (., .$player, drop = FALSE) %>% #2
lapply (., function (df) {
df %>%
.[rep(1:nrow(.), max(.$int_game)),] %>% #3
mutate (df_num = cumsum(.$game == 1)) %>% #4
return ()
}) %>%
bind_rows () %>%
base::split (., list(.$player, .$df_num), drop = TRUE) %>% #5
lapply (., function (df2) {
df2 %>%
mutate (cut1 = cumsum (str_detect(interesting, "yes")), #6
cut2 = if_else (cut1 == df_num, 1, 0)) %>% #7
slice (1:which(cut2 == 1)-1) %>% #8
return ()
}) %>%
bind_rows () %>%
select (player, game, genre)
gamehist
# A tibble: 6 × 3
player game genre
<dbl> <int> <chr>
1 1 1 JnR
2 1 2 Ego
3 2 1 RPG
4 2 1 RPG
5 2 2 Sports
6 2 3 JnR