对于给定的数据集,我想将我的数据集从长格式转换为宽格式。我使用了reshape函数来做到这一点。
id status timestamp
1 assigned 2017-01-02
1 done 2017-01-03
1 locked 2017-01-04
2 assigned 2017-01-02
2 done 2017-01-03
2 assigned 2017-01-03
2 done 2017-01-04
2 locked 2017-01-05
3 assigned 2017-01-02
3 done 2017-01-03
3 locked 2017-01-04
...
# reshape function to convert long format to Wide.
temp <- reshape(temp, idvar = "id", timevar = "status", direction = "wide")
结果:
id timestamp.assigned timestamp.done timestamp.locked
1 2017-01-02 2017-01-03 2017-01-04
2 2017-01-02 2017-01-03 2017-01-05
3 2017-01-02 2017-01-03 2017-01-04
当我这样做时它删除了一些行,例如:对于id 2,有多行匹配status=assigned
,它占用第一行。
如何在不删除行的情况下转换为宽。基本上,我不想丢失任何数据。
预期成绩:
qazxsw poi
qazxsw poi
qazxsw poi
qazxsw poi
id timestamp.assigned timestamp.done timestamp.locked
要么
1 2017-01-02 2017-01-03 2017-01-04
2 2017-01-02 2017-01-03 2017-01-05
2 2017-01-03 2017-01-04 2017-01-05
3 2017-01-02 2017-01-03 2017-01-04
id timestamp.assigned timestamp.done timestamp.locked
您可以做的一件事是添加一个变量,为每个新赋值赋予唯一值。然后你可以用它来塑造你的数据
1 2017-01-02 2017-01-03 2017-01-04
2 2017-01-02 2017-01-03 NA
2 2017-01-03 2017-01-04 2017-01-05
为每个新任务编号是要走的路。
但是,R已经具有可用于此目的的3 2017-01-02 2017-01-03 2017-01-04
函数:
i <- 0
temp$key <- sapply(temp$status, function(x) {
if(x == "assigned") {i <<- i+1; i}
else {i}
})
temp
id status timestamp key
1 1 assigned 2017-01-02 1
2 1 done 2017-01-03 1
3 1 locked 2017-01-04 1
4 2 assigned 2017-01-02 2
5 2 done 2017-01-03 2
6 2 assigned 2017-01-03 3
7 2 done 2017-01-04 3
8 2 locked 2017-01-05 3
9 3 assigned 2017-01-02 4
10 3 done 2017-01-03 4
11 3 locked 2017-01-04 4
temp2 <- reshape(temp, idvar = c("key", "id"), timevar = "status", direction = "wide")
temp2
id key timestamp.assigned timestamp.done timestamp.locked
1 1 1 2017-01-02 2017-01-03 2017-01-04
4 2 2 2017-01-02 2017-01-03 <NA>
6 2 3 2017-01-03 2017-01-04 2017-01-05
9 3 4 2017-01-02 2017-01-03 2017-01-04
cumsum()
虽然这解决了OP的原始问题,但cumsum()
只是对所有temp$key <- cumsum(temp$status == "assigned")
reshape(temp, idvar = c("key", "id"), timevar = "status", direction = "wide")
s的所有作业进行编号。如果OP更喜欢为每个 id key timestamp.assigned timestamp.done timestamp.locked
1: 1 1 2017-01-02 2017-01-03 2017-01-04
2: 2 2 2017-01-02 2017-01-03 <NA>
3: 2 3 2017-01-03 2017-01-04 2017-01-05
4: 3 4 2017-01-02 2017-01-03 2017-01-04
分别编号,我们需要应用cumsum()
分组的key
。
实现此目的的一种方法是使用id
语法:
id
cumsum()
id
是基础R的data.table
函数的替代品,可从library(data.table)
setDT(temp)[, key := cumsum(status == "assigned"), by = id]
dcast(temp, id + key ~ status, value.var = "timestamp")
和 id key assigned done locked
1: 1 1 2017-01-02 2017-01-03 2017-01-04
2: 2 1 2017-01-02 2017-01-03 <NA>
3: 2 2 2017-01-03 2017-01-04 2017-01-05
4: 3 1 2017-01-02 2017-01-03 2017-01-04
包中获得。
dcast()
on-the-flyreshape(..., direction = "wide")
的reshape2
的公式界面也接受表达式。有了这个,没有必要通过在重塑之前附加data.table
列来修改cumsum()
。相反,这可以在重塑时即时完成:
data.table
dcast()
temp