我有一个更大的数据集,里面有很多序列。例子:
Number <- c(1, 1, 1, 1, 2, 2, 2, 2)
Day <- c(1, 2, 3, 4, 1, 2, 3, 4)
Letter <- c("a", "a", "a", "a", "b", "b", "b", "b")
df <- data.frame(Number, Day, Letter)
df
#> Number Day Letter
#> 1 1 1 a
#> 2 1 2 a
#> 3 1 3 a
#> 4 1 4 a
#> 5 2 1 b
#> 6 2 2 b
#> 7 2 3 b
#> 8 2 4 b
创建于 2023-04-08 与 reprex v2.0.2
我想创建一个新列,告诉我新序列何时开始。例子:
df_des
#> Number Day Letter first
#> 1 1 1 a yes
#> 2 1 2 a no
#> 3 1 3 a no
#> 4 1 4 a no
#> 5 2 1 b yes
#> 6 2 2 b no
#> 7 2 3 b no
#> 8 2 4 b no
创建于 2023-04-08 与 reprex v2.0.2
这是一种基本的 R 方法 -
# Columns to consider for sequence change
cols <- c('Number', 'Letter')
# Create a new column with everything as "No"
df$first <- 'No'
# Replace the first value of each sequence to "Yes"
df$first[!duplicated(df[cols])] <- 'Yes'
df
# Number Day Letter first
#1 1 1 a Yes
#2 1 2 a No
#3 1 3 a No
#4 1 4 a No
#5 2 1 b Yes
#6 2 2 b No
#7 2 3 b No
#8 2 4 b No
这是在
dplyr
语句中使用 row_number
的 ifelse
方法:
library(dplyr) #>= 1.1.0
df %>%
mutate(first = ifelse(row_number() == 1, "yes", "no"), .by=Number)
Number Day Letter first
1 1 1 a yes
2 1 2 a no
3 1 3 a no
4 1 4 a no
5 2 1 b yes
6 2 2 b no
7 2 3 b no
8 2 4 b no
一个
base R
选项
transform(df, first = c("no", "yes")[1 + !duplicated(Number)])
-输出
Number Day Letter first
1 1 1 a yes
2 1 2 a no
3 1 3 a no
4 1 4 a no
5 2 1 b yes
6 2 2 b no
7 2 3 b no
8 2 4 b no