我正在尝试按时间段自动分配组号的属性。因为我正在编写函数,以按用户定义的不同时间段汇总天气数据的时间序列。让我们称“ n”为子句数
d1 = seq(as.Date("1910/1/1"), as.Date("1910/1/20"), "days")
d2 = seq(as.Date("1911/2/4"), as.Date("1911/2/27"), "days")
id1 = rep("1", length(d1))
id2 = rep("2", length(d2))
df = data.frame(date = c(d1,d2), id = c(id1,id2))
df
我想将日期切成数字“ n”,并将其添加到数据框的每一行中:如果我想要4天的时间,则类似这样:
df$period = c(rep(c(1:4), each = length(d1)/4), rep(c(1:4), each = length(d2)/4))
df
我的真实数据集中每个ID的日期长度不同。因此,这就是为什么我要构建具有相同大小的第一个组,并使用其余的构建最后一个组的原因。
假设我要第四期:我写了这个,但这只给我“ 4”:
df2 =df %>%
group_by(date,id) %>%
mutate(period = c(rep(seq(1,4-1, by = 1), each = as.integer(length(date)/4)),
rep(4, length(date)-((4-1)*as.integer(length(date)/4)))))
df2
有人有想法吗?
@@ hammoire:
因此这里以第一个ID为例,我有20个日期,如果我想将其分为3个期间:c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3)
谢谢
使用data.table:
d <- data.table(df)
# count by group
d[, t := sequence(.N), by=id]
# generate a sequence, needs to be long enough
# it will tell you when you want to start a new period
# you can automize the lenght by taking the max of the t variable, instead of 10000
sq <- seq(1, 10000, by=4)
# every time we want to start a new period we put a 1
d[t %in% sq, y := 1 ]
d[is.na(y), y := 0]
# now can just sum up
d[, yy := cumsum(y), by=id]
date id t y yy
1: 1910-01-01 1 1 1 1
2: 1910-01-02 1 2 0 1
3: 1910-01-03 1 3 0 1
4: 1910-01-04 1 4 0 1
5: 1910-01-05 1 5 1 2
6: 1910-01-06 1 6 0 2
7: 1910-01-07 1 7 0 2
8: 1910-01-08 1 8 0 2
9: 1910-01-09 1 9 1 3
10: 1910-01-10 1 10 0 3
11: 1910-01-11 1 11 0 3
12: 1910-01-12 1 12 0 3
13: 1910-01-13 1 13 1 4
14: 1910-01-14 1 14 0 4
15: 1910-01-15 1 15 0 4
16: 1910-01-16 1 16 0 4
17: 1910-01-17 1 17 1 5
18: 1910-01-18 1 18 0 5
19: 1910-01-19 1 19 0 5
20: 1910-01-20 1 20 0 5
21: 1911-02-04 2 1 1 1
22: 1911-02-05 2 2 0 1
23: 1911-02-06 2 3 0 1
24: 1911-02-07 2 4 0 1
25: 1911-02-08 2 5 1 2
26: 1911-02-09 2 6 0 2
27: 1911-02-10 2 7 0 2
28: 1911-02-11 2 8 0 2
29: 1911-02-12 2 9 1 3
30: 1911-02-13 2 10 0 3
31: 1911-02-14 2 11 0 3
32: 1911-02-15 2 12 0 3
33: 1911-02-16 2 13 1 4
34: 1911-02-17 2 14 0 4
35: 1911-02-18 2 15 0 4
36: 1911-02-19 2 16 0 4
37: 1911-02-20 2 17 1 5
38: 1911-02-21 2 18 0 5
39: 1911-02-22 2 19 0 5
40: 1911-02-23 2 20 0 5
41: 1911-02-24 2 21 1 6
42: 1911-02-25 2 22 0 6
43: 1911-02-26 2 23 0 6
44: 1911-02-27 2 24 0 6
date id t y yy