我正在尝试创建一些我不知道该怎么做的条件的新列,例如,我有此数据框
area
AR
AR
AR
AM
AM
AR
AR
AR
AM
AM
AM
AM
...
AM
AM
AM
AA
AA
...
AA
因此,当有AR次x且此后有一个或直到20 AM并返回有AR时,我想要一个带有AR的新列。当有AM x次并且只有AM,而又没有回到AR时,我想要带有AM的新列。像这样:而AA可以,AA =总是AA
area area2
AR AR
AR AR
AR AR
AM AR
AM AR
AR AR
AR AR
AR AR
AM AM
AM AM
AM AM
AM AM
... ...
AM AM
AM AM
AM AM
AA AA
AA AA
... ...
AA AA
我尝试处理序列,但是我不知道这是否是最好的方法
df$seq <- sequence(rle(as.character(df$area))$lengths)
有人知道该怎么办?谢谢!
尝试:
df$area2 <- c('AR', 'AM')[(rev(cumsum(rev(df$area) != 'AM')) == 0) + 1]
输出:
area area2
1 AR AR
2 AR AR
3 AR AR
4 AM AR
5 AM AR
6 AR AR
7 AR AR
8 AR AR
9 AM AM
10 AM AM
11 AM AM
12 AM AM
带有data.table
的另一个选项(如上没有fifelse
,可以缩短:]
setDT(df)[, area2 := fifelse(rleid(area) == max(rleid(area)) & area == 'AM', 'AM', 'AR')]
最后一个短于您可能尝试过的rle
:
idx <- with(rle(as.character(df$area)), rep(seq_along(lengths), lengths))
df$area2 <- c('AR', 'AM')[(idx == max(idx) & df$area == 'AM') + 1]