我希望计算连续出现的任何值,并将该值分配给下一列中的该值。下面是输入和所需输出的示例:
dataset <- data.frame(input = c("a","b","b","a","a","c","a","a","a","a","b","c"))
dataset$count <- c(1,2,2,2,2,1,4,4,4,4,1,1)
dataset
input count
a 1
b 2
b 2
a 2
a 2
c 1
a 4
a 4
a 4
a 4
b 1
c 1
使用rle(dataset$input)
,我可以获取每个值的出现次数。但是我想要以上格式的结果输出。
我的问题类似于:R: count consecutive occurrences of values in a single column但是这里的输出是顺序的,我想将计数本身分配给该值。
您可以在lengths
中重复lengths
参数rle
时间
with(rle(dataset$input), rep(lengths, lengths))
#[1] 1 2 2 2 2 1 4 4 4 4 1 1
使用dplyr
,我们可以使用lag
dataset %>%
group_by(gr = cumsum(input != lag(input, default = first(input)))) %>%
mutate(count = n())
以及data.table
setDT(dataset)[, count:= .N, rleid(input)]
数据
确保input
列是字符而不是factor
。
dataset <- data.frame(input = c("a","b","b","a","a","c","a","a","a","a","b","c"),
stringsAsFactors = FALSE)