识别序列何时停止

Question

新来的，如果这个问题之前已经回答过，我们深表歉意。我正在尝试识别数据集中的序列。

例子：

id	逗留时间
1	1
1	2
1	3
2	1
2	2
3	1
3	2
3	3
3	4

然后我想创建一个变量，当序列仍在进行时为 0，当序列结束时为 1。

例如

id	逗留时间	新变量
1	1	0
1	2	0
1	3	1
2	1	0
2	2	1
3	1	0
3	2	0
3	3	0
3	4	1

提前感谢您的帮助。如果我可以提供任何其他有用的东西，请告诉我。

我对 R 很陌生，所以很抱歉，但我不知道从哪里开始。再次感谢您的帮助！

Answer 1

您可以使用

ave

+

max

> transform(df, newvariable = +(ave(lengthofstay, id, FUN = max) == lengthofstay))
  id lengthofstay newvariable
1  1            1           0
2  1            2           0
3  1            3           1
4  2            1           0
5  2            2           1
6  3            1           0
7  3            2           0
8  3            3           0
9  3            4           1

Answer 2

您可以利用系列中连续元素之间的差异始终大于零这一事实，除非系列重新启动：

df <- data.frame(
  id = c(1, 1, 1, 2, 2, 3, 3, 3, 3),
  lengthofstay = c(1:3, 1:2, 1:4)
)
df$newvariable <- c(diff(df$lengthofstay) <= 0, 1)
df
##   id lengthofstay newvariable
## 1  1            1           0
## 2  1            2           0
## 3  1            3           1
## 4  2            1           0
## 5  2            2           1
## 6  3            1           0
## 7  3            2           0
## 8  3            3           0
## 9  3            4           1

此解决方案不使用列

id

，而仅依赖于

lengthofstay

.

Answer 3

data.table解决方案

library(data.table)

dt = data.table(
  id = c(rep(1, 3), rep(2, 2), rep(3, 4)),
  lengthofstay = c(1:3, 1:2, 1:4)
)
dt[, newvariable := 0]
dt[, newvariable := lengthofstay == max(lengthofstay), by = id]
dt

Answer 4

使用

dplyr

：

# If you don't have `dplyr` installed run:
# install.packages("dplyr")

library(dplyr)

df <- data.frame(
  id = c(1, 1, 1, 2, 2, 3, 3, 3, 3)
)

df %>%
  # All computations are performed within group by `id`:
  group_by(id) %>%
  # `mutate` creates columns sequentially. We create `lengthofstay` first and
  # then we can use `lengthofstay` in the creation of `newvariable`.
  mutate(
    # `row_number` returns the order of each row within group.
    lengthofstay = row_number(),
    # `lengthofstay == max(lengthofstay)` returns a logical with `FALSE` if the
    # row is not the last element of the group, and `TRUE` otherwise.
    # `as.numeric` then converts `FALSE` to 0 and `TRUE` to 1.
    newvariable = as.numeric(lengthofstay == max(lengthofstay))
  ) %>%
  # We don't need to have the data grouped anymore so we call `ungroup`.
  ungroup()

这可能有点冗长，但（在我看来）也更好读。

无评论：

library(dplyr)

df <- data.frame(
  id = c(1, 1, 1, 2, 2, 3, 3, 3, 3)
)

df %>%
  group_by(id) %>%
  mutate(
    lengthofstay = row_number(),
    newvariable = as.numeric(lengthofstay == max(lengthofstay))
  ) %>%
  ungroup()

Answer 5

这是一个更通用的

dplyr

解决方案。它更通用，因为它按

id

分组并考虑相邻值。特别是，如果观察是

id

的最后一个观察，或者如果

lengthofstay

的下一个值不等于当前值 + 1，它会记录 1。

library(dplyr)

df <- data.frame(
  id = c(1, 1, 1, 2, 2, 3, 3, 3, 3),
  lengthofstay = c(1:3, 1:2, 1:4)
)


df %>% 
  group_by(id) %>% 
  mutate(newvariable = ifelse(row_number() == n() | lengthofstay+1 != lead(lengthofstay), 1, 0))
#> # A tibble: 9 × 3
#> # Groups:   id [3]
#>      id lengthofstay newvariable
#>   <dbl>        <int>       <dbl>
#> 1     1            1           0
#> 2     1            2           0
#> 3     1            3           1
#> 4     2            1           0
#> 5     2            2           1
#> 6     3            1           0
#> 7     3            2           0
#> 8     3            3           0
#> 9     3            4           1

^{创建于 2023-04-11 与 reprex v2.0.2}

识别序列何时停止

问题描述投票：0回答：5

5个回答

最新问题

识别序列何时停止

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5