重塑长到宽并保持重复的行

Question

对于给定的数据集，我想将我的数据集从长格式转换为宽格式。我使用了reshape函数来做到这一点。

id  status      timestamp   
1   assigned   2017-01-02  
1   done       2017-01-03  
1   locked     2017-01-04   
2   assigned   2017-01-02   
2   done       2017-01-03  
2   assigned   2017-01-03  
2   done       2017-01-04 
2   locked     2017-01-05  
3   assigned   2017-01-02  
3   done       2017-01-03 
3   locked     2017-01-04 
...

# reshape function to convert long format to Wide.
temp <- reshape(temp, idvar = "id", timevar = "status", direction = "wide")

结果：

id timestamp.assigned timestamp.done timestamp.locked 1 2017-01-02 2017-01-03 2017-01-04 2 2017-01-02 2017-01-03 2017-01-05 3 2017-01-02 2017-01-03 2017-01-04

当我这样做时它删除了一些行，例如：对于id 2，有多行匹配status=assigned，它占用第一行。

如何在不删除行的情况下转换为宽。基本上，我不想丢失任何数据。

预期成绩： qazxsw poi qazxsw poi qazxsw poi qazxsw poi id timestamp.assigned timestamp.done timestamp.locked

要么

1 2017-01-02 2017-01-03 2017-01-04 2 2017-01-02 2017-01-03 2017-01-05 2 2017-01-03 2017-01-04 2017-01-05 3 2017-01-02 2017-01-03 2017-01-04 id timestamp.assigned timestamp.done timestamp.locked

Answer 1

您可以做的一件事是添加一个变量，为每个新赋值赋予唯一值。然后你可以用它来塑造你的数据

1        2017-01-02         2017-01-03          2017-01-04

Answer 2

1. `2 2017-01-02 2017-01-03 NA`

2 2017-01-03 2017-01-04 2017-01-05为每个新任务编号是要走的路。

但是，R已经具有可用于此目的的3 2017-01-02 2017-01-03 2017-01-04函数：

i <- 0

temp$key <- sapply(temp$status, function(x) {
  if(x == "assigned") {i <<- i+1; i}
  else {i}
})

temp

   id   status  timestamp key
1   1 assigned 2017-01-02   1
2   1     done 2017-01-03   1
3   1   locked 2017-01-04   1
4   2 assigned 2017-01-02   2
5   2     done 2017-01-03   2
6   2 assigned 2017-01-03   3
7   2     done 2017-01-04   3
8   2   locked 2017-01-05   3
9   3 assigned 2017-01-02   4
10  3     done 2017-01-03   4
11  3   locked 2017-01-04   4

temp2 <- reshape(temp, idvar = c("key", "id"), timevar = "status", direction = "wide")

temp2

  id key timestamp.assigned timestamp.done timestamp.locked
1  1   1         2017-01-02     2017-01-03       2017-01-04
4  2   2         2017-01-02     2017-01-03             <NA>
6  2   3         2017-01-03     2017-01-04       2017-01-05
9  3   4         2017-01-02     2017-01-03       2017-01-04

cumsum()

2. Grouped Esther's approach

虽然这解决了OP的原始问题，但cumsum()只是对所有temp$key <- cumsum(temp$status == "assigned") reshape(temp, idvar = c("key", "id"), timevar = "status", direction = "wide")s的所有作业进行编号。如果OP更喜欢为每个id key timestamp.assigned timestamp.done timestamp.locked 1: 1 1 2017-01-02 2017-01-03 2017-01-04 2: 2 2 2017-01-02 2017-01-03 <NA> 3: 2 3 2017-01-03 2017-01-04 2017-01-05 4: 3 4 2017-01-02 2017-01-03 2017-01-04分别编号，我们需要应用cumsum()分组的key。

实现此目的的一种方法是使用id语法：

id

cumsum()

id是基础R的data.table函数的替代品，可从library(data.table) setDT(temp)[, key := cumsum(status == "assigned"), by = id] dcast(temp, id + key ~ status, value.var = "timestamp")和id key assigned done locked 1: 1 1 2017-01-02 2017-01-03 2017-01-04 2: 2 1 2017-01-02 2017-01-03 <NA> 3: 2 2 2017-01-03 2017-01-04 2017-01-05 4: 3 1 2017-01-02 2017-01-03 2017-01-04包中获得。

3. Grouped `dcast()` on-the-fly

reshape(..., direction = "wide")的reshape2的公式界面也接受表达式。有了这个，没有必要通过在重塑之前附加data.table列来修改cumsum()。相反，这可以在重塑时即时完成：

data.table

dcast()

Data

temp

重塑长到宽并保持重复的行

问题描述投票：2回答：2

2个回答

1. `2 2017-01-02 2017-01-03 NA`

2. Grouped Esther's approach

3. Grouped `dcast()` on-the-fly

Data

最新问题

重塑长到宽并保持重复的行

问题描述 投票：2回答：2

2个回答

1. 2 2017-01-02 2017-01-03 NA

2. Grouped Esther's approach

3. Grouped dcast() on-the-fly

Data

最新问题

问题描述投票：2回答：2

1. `2 2017-01-02 2017-01-03 NA`

3. Grouped `dcast()` on-the-fly