添加序列

问题描述 投票:0回答:3

我有一个(简化的)数据框,看起来像这样:

Index     Studio Event 
1
2          MovieStart
3
4
5
6
7           MovieEnd 
8
9
10          MovieStart
11
12
13
14
15          MovieEnd

我想创建第三列,该列创建一个从0到50的序列,该序列在StudioEvent = MovieStart时开始,在StudioEvent = MovieEnd时结束。所以像这样:

Index     Studio Event    Sequence
1
2          MovieStart      0
3                          50
4                          100
5                          150 
6                          200
7           MovieEnd       250
8
9
10          MovieStart     0
11                         50
12                         100
13                         150
14                         200
15          MovieEnd       250

任何想法我该怎么做?先感谢您。

r sequence
3个回答
0
投票

这里是基本的R选项

inds <- Map(`:`,which(df$StudioEvent=="MovieStart"),which(df$StudioEvent=="MovieEnd"))
df$Sequence<-as.numeric(replace(df$StudioEvent,unlist(inds),(unlist(Map(seq_along,inds))-1)*50))

诸如此类

> df
   Index StudioEvent Sequence
1      1        <NA>       NA
2      2  MovieStart        0
3      3        <NA>       50
4      4        <NA>      100
5      5        <NA>      150
6      6        <NA>      200
7      7    MovieEnd      250
8      8        <NA>       NA
9      9        <NA>       NA
10    10  MovieStart        0
11    11        <NA>       50
12    12        <NA>      100
13    13        <NA>      150
14    14        <NA>      200
15    15    MovieEnd      250

数据

> dput(df)
structure(list(Index = 1:15, StudioEvent = c(NA, "MovieStart", 
NA, NA, NA, NA, "MovieEnd", NA, NA, "MovieStart", NA, NA, NA,
NA, "MovieEnd")), row.names = c(NA, -15L), class = "data.frame")

0
投票

您可以尝试一下,它对我有用。

n<-data.frame(index= seq(1:15),              
   studio=c(NA,"MS",NA,NA,NA,NA,"ME",NA,NA,"MS",NA,NA,NA,NA,"ME"))

n$studio2<-0   #New column
n$studio2[n$studio=="MS"]<-1 ; n$studio2[n$studio=="ME"]<-2

n$seq<-0      #New column sequence
j<-1          #counter

k1<- which(n$studio=="MS")  #where movie starts
k2<- which(n$studio=="ME")  #where movie ends

循环

  for(i in 1:length(n$studio2))
{
    if(n$studio2[i]==1)
   {  
      k<- k2[j]-k1[j]
      w<-seq(0,k*50,by=50)
      n$seq[k1[j]:k2[j]]<-w
      j<-j+1
    }
}

0
投票

使用data.table的选项:

#identify indices between MovieStart and MovieEnd
DT[, cs := cumsum(StudioEvent=="MovieStart") - cumsum(StudioEvent=="MovieEnd")]

#perform rolling join to find the start of movies for MovieEnd and indices between MovieStart and MovieEnd
DT[StudioEvent=="MovieEnd" | cs == 1L, 
    ms := DT[StudioEvent=="MovieStart"][.SD, on=.(Index), roll=Inf, x.Index]
]

#generate sequence
DT[, Sequence := (Index - ms) * 50]

输出:

    Index StudioEvent cs ms Sequence
 1:     1              0 NA       NA
 2:     2  MovieStart  1  2        0
 3:     3              1  2       50
 4:     4              1  2      100
 5:     5              1  2      150
 6:     6              1  2      200
 7:     7    MovieEnd  0  2      250
 8:     8              0 NA       NA
 9:     9              0 NA       NA
10:    10  MovieStart  1 10        0
11:    11              1 10       50
12:    12              1 10      100
13:    13              1 10      150
14:    14              1 10      200
15:    15    MovieEnd  0 10      250

数据:

library(data.table)
DT <- fread("Index,StudioEvent 
1,
2,MovieStart
3,
4,
5,
6,
7,MovieEnd 
8,
9,
10,MovieStart
11,
12,
13,
14,
15,MovieEnd")
© www.soinside.com 2019 - 2024. All rights reserved.