具有重复值和具有断点时创建新列

问题描述 投票:0回答:1

我正在尝试创建一些我不知道该怎么做的条件的新列,例如,我有此数据框

area
AR
AR
AR
AM
AM
AR
AR
AR
AM
AM
AM
AM
...
AM
AM
AM
AA
AA
...
AA

因此,当有AR次x且此后有一个或直到20 AM并返回有AR时,我想要一个带有AR的新列。当有AM x次并且只有AM,而又没有回到AR时,我想要带有AM的新列。像这样:而AA可以,AA =总是AA

area    area2
AR      AR
AR      AR
AR      AR
AM      AR
AM      AR
AR      AR
AR      AR
AR      AR
AM      AM
AM      AM
AM      AM
AM      AM
...     ...
AM      AM
AM      AM
AM      AM
AA      AA
AA      AA
...    ...
AA      AA

我尝试处理序列,但是我不知道这是否是最好的方法

df$seq <- sequence(rle(as.character(df$area))$lengths)

有人知道该怎么办?谢谢!

r function if-statement repeat breakpoints
1个回答
0
投票

尝试:

df$area2 <- c('AR', 'AM')[(rev(cumsum(rev(df$area) != 'AM')) == 0) + 1]

输出:

   area area2
1    AR    AR
2    AR    AR
3    AR    AR
4    AM    AR
5    AM    AR
6    AR    AR
7    AR    AR
8    AR    AR
9    AM    AM
10   AM    AM
11   AM    AM
12   AM    AM

带有data.table的另一个选项(如上没有fifelse,可以缩短:]

setDT(df)[, area2 := fifelse(rleid(area) == max(rleid(area)) & area == 'AM', 'AM', 'AR')]

最后一个短于您可能尝试过的rle

idx <- with(rle(as.character(df$area)), rep(seq_along(lengths), lengths))

df$area2 <- c('AR', 'AM')[(idx == max(idx) & df$area == 'AM') + 1]
© www.soinside.com 2019 - 2024. All rights reserved.