我有一个(简化的)数据框,看起来像这样:
Index Studio Event
1
2 MovieStart
3
4
5
6
7 MovieEnd
8
9
10 MovieStart
11
12
13
14
15 MovieEnd
我想创建第三列,该列创建一个从0到50的序列,该序列在StudioEvent = MovieStart时开始,在StudioEvent = MovieEnd时结束。所以像这样:
Index Studio Event Sequence
1
2 MovieStart 0
3 50
4 100
5 150
6 200
7 MovieEnd 250
8
9
10 MovieStart 0
11 50
12 100
13 150
14 200
15 MovieEnd 250
任何想法我该怎么做?先感谢您。
这里是基本的R选项
inds <- Map(`:`,which(df$StudioEvent=="MovieStart"),which(df$StudioEvent=="MovieEnd"))
df$Sequence<-as.numeric(replace(df$StudioEvent,unlist(inds),(unlist(Map(seq_along,inds))-1)*50))
诸如此类
> df
Index StudioEvent Sequence
1 1 <NA> NA
2 2 MovieStart 0
3 3 <NA> 50
4 4 <NA> 100
5 5 <NA> 150
6 6 <NA> 200
7 7 MovieEnd 250
8 8 <NA> NA
9 9 <NA> NA
10 10 MovieStart 0
11 11 <NA> 50
12 12 <NA> 100
13 13 <NA> 150
14 14 <NA> 200
15 15 MovieEnd 250
数据
> dput(df)
structure(list(Index = 1:15, StudioEvent = c(NA, "MovieStart",
NA, NA, NA, NA, "MovieEnd", NA, NA, "MovieStart", NA, NA, NA,
NA, "MovieEnd")), row.names = c(NA, -15L), class = "data.frame")
您可以尝试一下,它对我有用。
n<-data.frame(index= seq(1:15),
studio=c(NA,"MS",NA,NA,NA,NA,"ME",NA,NA,"MS",NA,NA,NA,NA,"ME"))
n$studio2<-0 #New column
n$studio2[n$studio=="MS"]<-1 ; n$studio2[n$studio=="ME"]<-2
n$seq<-0 #New column sequence
j<-1 #counter
k1<- which(n$studio=="MS") #where movie starts
k2<- which(n$studio=="ME") #where movie ends
循环
for(i in 1:length(n$studio2))
{
if(n$studio2[i]==1)
{
k<- k2[j]-k1[j]
w<-seq(0,k*50,by=50)
n$seq[k1[j]:k2[j]]<-w
j<-j+1
}
}
使用data.table
的选项:
#identify indices between MovieStart and MovieEnd
DT[, cs := cumsum(StudioEvent=="MovieStart") - cumsum(StudioEvent=="MovieEnd")]
#perform rolling join to find the start of movies for MovieEnd and indices between MovieStart and MovieEnd
DT[StudioEvent=="MovieEnd" | cs == 1L,
ms := DT[StudioEvent=="MovieStart"][.SD, on=.(Index), roll=Inf, x.Index]
]
#generate sequence
DT[, Sequence := (Index - ms) * 50]
输出:
Index StudioEvent cs ms Sequence
1: 1 0 NA NA
2: 2 MovieStart 1 2 0
3: 3 1 2 50
4: 4 1 2 100
5: 5 1 2 150
6: 6 1 2 200
7: 7 MovieEnd 0 2 250
8: 8 0 NA NA
9: 9 0 NA NA
10: 10 MovieStart 1 10 0
11: 11 1 10 50
12: 12 1 10 100
13: 13 1 10 150
14: 14 1 10 200
15: 15 MovieEnd 0 10 250
数据:
library(data.table)
DT <- fread("Index,StudioEvent
1,
2,MovieStart
3,
4,
5,
6,
7,MovieEnd
8,
9,
10,MovieStart
11,
12,
13,
14,
15,MovieEnd")