识别序列何时停止

问题描述 投票:0回答:5

新来的,如果这个问题之前已经回答过,我们深表歉意。我正在尝试识别数据集中的序列。

例子:

id 逗留时间
1 1
1 2
1 3
2 1
2 2
3 1
3 2
3 3
3 4

然后我想创建一个变量,当序列仍在进行时为 0,当序列结束时为 1。

例如

id 逗留时间 新变量
1 1 0
1 2 0
1 3 1
2 1 0
2 2 1
3 1 0
3 2 0
3 3 0
3 4 1

提前感谢您的帮助。如果我可以提供任何其他有用的东西,请告诉我。

我对 R 很陌生,所以很抱歉,但我不知道从哪里开始。再次感谢您的帮助!

r sequence
5个回答
2
投票

您可以使用

ave
+
max

> transform(df, newvariable = +(ave(lengthofstay, id, FUN = max) == lengthofstay))
  id lengthofstay newvariable
1  1            1           0
2  1            2           0
3  1            3           1
4  2            1           0
5  2            2           1
6  3            1           0
7  3            2           0
8  3            3           0
9  3            4           1

2
投票

您可以利用系列中连续元素之间的差异始终大于零这一事实,除非系列重新启动:

df <- data.frame(
  id = c(1, 1, 1, 2, 2, 3, 3, 3, 3),
  lengthofstay = c(1:3, 1:2, 1:4)
)
df$newvariable <- c(diff(df$lengthofstay) <= 0, 1)
df
##   id lengthofstay newvariable
## 1  1            1           0
## 2  1            2           0
## 3  1            3           1
## 4  2            1           0
## 5  2            2           1
## 6  3            1           0
## 7  3            2           0
## 8  3            3           0
## 9  3            4           1

此解决方案不使用列

id
,而仅依赖于
lengthofstay
.


2
投票

data.table解决方案

library(data.table)

dt = data.table(
  id = c(rep(1, 3), rep(2, 2), rep(3, 4)),
  lengthofstay = c(1:3, 1:2, 1:4)
)
dt[, newvariable := 0]
dt[, newvariable := lengthofstay == max(lengthofstay), by = id]
dt

0
投票

使用

dplyr

# If you don't have `dplyr` installed run:
# install.packages("dplyr")

library(dplyr)

df <- data.frame(
  id = c(1, 1, 1, 2, 2, 3, 3, 3, 3)
)

df %>%
  # All computations are performed within group by `id`:
  group_by(id) %>%
  # `mutate` creates columns sequentially. We create `lengthofstay` first and
  # then we can use `lengthofstay` in the creation of `newvariable`.
  mutate(
    # `row_number` returns the order of each row within group.
    lengthofstay = row_number(),
    # `lengthofstay == max(lengthofstay)` returns a logical with `FALSE` if the
    # row is not the last element of the group, and `TRUE` otherwise.
    # `as.numeric` then converts `FALSE` to 0 and `TRUE` to 1.
    newvariable = as.numeric(lengthofstay == max(lengthofstay))
  ) %>%
  # We don't need to have the data grouped anymore so we call `ungroup`.
  ungroup()

这可能有点冗长,但(在我看来)也更好读。


无评论:

library(dplyr)

df <- data.frame(
  id = c(1, 1, 1, 2, 2, 3, 3, 3, 3)
)

df %>%
  group_by(id) %>%
  mutate(
    lengthofstay = row_number(),
    newvariable = as.numeric(lengthofstay == max(lengthofstay))
  ) %>%
  ungroup()

0
投票

这是一个更通用的

dplyr
解决方案。它更通用,因为它按
id
分组并考虑相邻值。特别是,如果观察是
id
的最后一个观察,或者如果
lengthofstay
的下一个值不等于当前值 + 1,它会记录 1。

library(dplyr)

df <- data.frame(
  id = c(1, 1, 1, 2, 2, 3, 3, 3, 3),
  lengthofstay = c(1:3, 1:2, 1:4)
)


df %>% 
  group_by(id) %>% 
  mutate(newvariable = ifelse(row_number() == n() | lengthofstay+1 != lead(lengthofstay), 1, 0))
#> # A tibble: 9 × 3
#> # Groups:   id [3]
#>      id lengthofstay newvariable
#>   <dbl>        <int>       <dbl>
#> 1     1            1           0
#> 2     1            2           0
#> 3     1            3           1
#> 4     2            1           0
#> 5     2            2           1
#> 6     3            1           0
#> 7     3            2           0
#> 8     3            3           0
#> 9     3            4           1

创建于 2023-04-11 与 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.