R - 复制数据表并创建连续日期

问题描述 投票:1回答:1

我有以下数据表:

dt <- data.table(date=c(seq.Date(as.Date("2000-01-01"),as.Date("2000-01-03"),"1 day")),
                 a=c(1,2,3),
                 b=c(1,2,3),
                 c=c(1,2,3))
> dt
         date a b c
1: 2000-01-01 1 1 1
2: 2000-01-02 2 2 2
3: 2000-01-03 3 3 3

我需要复制它n次(代码取自Repeat data.frame N times):

n <- 3
dt.rep <- dt[rep(seq_len(nrow(dt)), n)]

> dt.rep
         date a b c
1: 2000-01-01 1 1 1
2: 2000-01-02 2 2 2
3: 2000-01-03 3 3 3
4: 2000-01-01 1 1 1
5: 2000-01-02 2 2 2
6: 2000-01-03 3 3 3
7: 2000-01-01 1 1 1
8: 2000-01-02 2 2 2
9: 2000-01-03 3 3 3

但是,我需要date列是顺序的。最后一行是实际的最后一行,我需要它倒退到第一行,所以预期的输出是:

         date a b c
1: 1999-12-26 1 1 1
2: 1999-12-27 2 2 2
3: 1999-12-28 3 3 3
4: 1999-12-29 1 1 1
5: 1999-12-30 2 2 2
6: 1999-12-31 3 3 3
7: 2000-01-01 1 1 1
8: 2000-01-02 2 2 2
9: 2000-01-03 3 3 3

怎么做到这一点?

编辑:

对于大型的每小时数据集,建议的解决方案似乎失败了。考虑这个新例子:

dt <- data.table(date=seq(as.POSIXct("1994-01-01 00:00:00"), as.POSIXct("2008-12-31 23:00:00"), by="1 hour"), temp=runif(n=131496, min=10, max=35)) 
> dt
                       date     temp
     1: 1994-01-01 00:00:00 26.40286
     2: 1994-01-01 01:00:00 21.37171
     3: 1994-01-01 02:00:00 16.11227
     4: 1994-01-01 03:00:00 30.28062
     5: 1994-01-01 04:00:00 25.22336
    ---                             
131492: 2008-12-31 19:00:00 18.43148
131493: 2008-12-31 20:00:00 24.10905
131494: 2008-12-31 21:00:00 10.33235
131495: 2008-12-31 22:00:00 27.73049
131496: 2008-12-31 23:00:00 21.74835

当复制它5时,这就是我们拥有的:

n <- 5
dt[rep(seq_len(.N), n)][, newdate:=rev(seq(last(date),
                                           length.out=.N, by='-1 hour'))][]
                       date     temp             newdate
     1: 1994-01-01 00:00:00 26.40286 1933-12-31 00:00:00
     2: 1994-01-01 01:00:00 21.37171 1933-12-31 01:00:00
     3: 1994-01-01 02:00:00 16.11227 1933-12-31 02:00:00
     4: 1994-01-01 03:00:00 30.28062 1933-12-31 03:00:00
     5: 1994-01-01 04:00:00 25.22336 1933-12-31 04:00:00
    ---                                                 
657476: 2008-12-31 19:00:00 18.43148 2008-12-31 19:00:00
657477: 2008-12-31 20:00:00 24.10905 2008-12-31 20:00:00
657478: 2008-12-31 21:00:00 10.33235 2008-12-31 21:00:00
657479: 2008-12-31 22:00:00 27.73049 2008-12-31 22:00:00
657480: 2008-12-31 23:00:00 21.74835 2008-12-31 23:00:00

注意datenewdate列是如何不同步的。我希望newdate能够从1934-01-01 00:00:00开始,而是从1933-12-31 00:00:00开始。这导致数据表具有76(length(unique(year(dt$newdate))))年的数据,而不是原始的5年中15 years75复制。我不确定这里发生了什么......

r date data.table
1个回答
1
投票

replication步骤之后,使用last'日期',通过将rev指定为seq(行数和length.out为负1天)获取'date'的.Nerse byuence

dt[rep(seq_len(.N), n)][, date := rev(seq(last(date),
       length.out = .N, by = '-1 day'))][]
#         date a b c
#1: 1999-12-26 1 1 1
#2: 1999-12-27 2 2 2
#3: 1999-12-28 3 3 3
#4: 1999-12-29 1 1 1
#5: 1999-12-30 2 2 2
#6: 1999-12-31 3 3 3
#7: 2000-01-01 1 1 1
#8: 2000-01-02 2 2 2
#9: 2000-01-03 3 3 3

Update

根据OP的评论,似乎每个复制的“日期”序列应该是reversed。在这种情况下,我们可以使用replication作为分组变量

n <- 5
dt[rep(seq_len(.N), n)][, newdate := rev(seq(last(date),
   length.out = .N, by='-1 hour')), by = .(rep(seq_len(n), each = nrow(dt)))][]
#                  date     temp             newdate
#1: 1994-01-01 00:00:00 34.19615 1994-01-01 00:00:00
#2: 1994-01-01 01:00:00 34.29310 1994-01-01 01:00:00
# ...

注意:在OP的帖子中使用更新的数据

© www.soinside.com 2019 - 2024. All rights reserved.