# dataframe
df <- tibble(
# groups
group = c("A", "A", "A", "B", "A", "A", "A", "C", "B", "B", "C", "A"),
# running count of all groups
count = c(1,1,1,2,3,3,3,4,5,5,6,7)
在我的数据中有一系列重复组。在此示例中,第 1-3 行是组“A”第一次出现,第 4 行是组“B”第一次出现,第 5-7 行是组“A”第二次出现,第 8 行是第一次出现组“C”等。
count
是所有组的运行计数。
我想要每个单独组出现的运行计数,以提供
individual_runing_count
中看到的输出。 groups
的顺序和每组出现的次数并不总是相同,所以我不能使用固定的标准,例如行数或groups
列的顺序。
# df with required output
df <- tibble(
# groups
group = c("A", "A", "A", "B", "A", "A", "A", "C", "B", "B", "C", "A"),
# order in which groups occurred
order = c(1,1,1,2,3,3,3,4,5,5,6,7),
# required column output
individual_running_count = c(1,1,1,1,2,2,2,1,2,2,2,3)
)
mutate(df, individual_running_count=consecutive_id(count), .by=group)
# A tibble: 12 x 3
group count individual_running_count
<chr> <dbl> <int>
1 A 1 1
2 A 1 1
3 A 1 1
4 B 2 1
5 A 3 2
6 A 3 2
7 A 3 2
8 C 4 1
9 B 5 2
10 B 5 2
11 C 6 2
12 A 7 3
基础R:
df$irc <- ave(df$order, df$group, FUN = \(x) match(x, unique(x)))
data.table
方法
library(data.table)
setDT(df)[, irc := rleid(count), by = group]