如何用R整数变量滞后?

问题描述 投票:4回答:1

说我有以下历史联赛成绩:

Season <- c(1,1,2,2,3,3,4,4,5,5)
Team <- c("Diverpool","Deverton","Diverpool","Deverton","Diverpool","Deverton","Diverpool","Deverton","Diverpool","Deverton")
End.Rank <- c(8,17,4,15,3,6,4,16,3,17)
PLRank <- cbind(Season,Team,End.Rank)

我希望(有效地)根据两个标准为每个团队创建一年滞后变量:

  1. End.Rank滞后Season(即以Season为时间变量的t-1)
  2. 由团队分开(Deverton的滞后End.Rank与Diverpool的滞后End.Rank

基本上,我希望输出如下:

l.End.Rank <- c(NA,NA,8,17,4,15,3,6,4,16)

尝试lag(),并在尝试在for()循环中失去了。

r time-series analytics categories lag
1个回答
2
投票

您可以尝试以下其中一项......

请注意,我使用了data.frame而不是matrix获得的cbind

PLRank <- data.frame(Season, Team, End.Rank)

使用“data.table”:

library(data.table)
setDT(PLRank)[, l.End.Rank := shift(End.Rank), by = .(Team)][]
#     Season      Team End.Rank l.End.Rank
#  1:      1 Diverpool        8         NA
#  2:      1  Deverton       17         NA
#  3:      2 Diverpool        4          8
#  4:      2  Deverton       15         17
#  5:      3 Diverpool        3          4
#  6:      3  Deverton        6         15
#  7:      4 Diverpool        4          3
#  8:      4  Deverton       16          6
#  9:      5 Diverpool        3          4
# 10:      5  Deverton       17         16

或者,使用“dplyr”:

library(dplyr)
PLRank %>%
  group_by(Team) %>%
  mutate(l.End.Rank = lag(End.Rank))
# Source: local data frame [10 x 4]
# Groups: Team [2]
# 
#    Season      Team End.Rank l.End.Rank
#     (dbl)    (fctr)    (dbl)      (dbl)
# 1       1 Diverpool        8         NA
# 2       1  Deverton       17         NA
# 3       2 Diverpool        4          8
# 4       2  Deverton       15         17
# 5       3 Diverpool        3          4
# 6       3  Deverton        6         15
# 7       4 Diverpool        4          3
# 8       4  Deverton       16          6
# 9       5 Diverpool        3          4
# 10      5  Deverton       17         16

更新

我老实说完全误读了你想按季节分组。

如果你是按季节滞后,也许你应该考虑扩大数据,这样每个赛季只有一排。然后按季节来说很容易。

例子:

在这里,我们使用“data.table”中的dcast将“End.Rank”的值传播出“Team”。然后,我们只延迟新创建的列。

library(data.table)
teams <- as.character(unique(PLRank$Team))
dcast(as.data.table(PLRank), Season ~ Team, value.var = "End.Rank")[
  , (teams) := lapply(.SD, shift), .SDcols = teams][]
#    Season Deverton Diverpool
# 1:      1       NA        NA
# 2:      2       17         8
# 3:      3       15         4
# 4:      4        6         3
# 5:      5       16         4

或者,如果您希望团队名称和值都是宽泛的形式,您可以尝试以下方法:

dcast(as.data.table(PLRank)[, ind := sequence(.N), by = Season], 
      Season ~ ind, value.var = c("Team", "End.Rank"))[
        , c("End.Rank_1", "End.Rank_2") := lapply(.SD, shift), 
        .SDcols = c("End.Rank_1", "End.Rank_2")][]
#    Season    Team_1   Team_2 End.Rank_1 End.Rank_2
# 1:      1 Diverpool Deverton         NA         NA
# 2:      2 Diverpool Deverton          8         17
# 3:      3 Diverpool Deverton          4         15
# 4:      4 Diverpool Deverton          3          6
# 5:      5 Diverpool Deverton          4         16

“dplyr”中的方法是类似的。由于您要使用的是宽屏,因此您还需要加载“tidyr”。

library(dplyr)
library(tidyr)
PLRank %>%
  spread(Team, End.Rank) %>%
  mutate_each(funs(lag), -Season)
#   Season Deverton Diverpool
# 1      1       NA        NA
# 2      2       17         8
# 3      3       15         4
# 4      4        6         3
# 5      5       16         4
© www.soinside.com 2019 - 2024. All rights reserved.