按组计算数据框中 1 出现的间隔年数

问题描述 投票:0回答:3

我有以下数据框:

国家 冲突
国家1 1990 1
国家1 1991 0
国家1 1992 0
国家1 1993 0
国家1 1994 1
国家1 1995 1
国家1 1996 0
国家1 1997 0
国家1 1998 0
国家1 1999 0
国家1 2014 0
国家2 1990 0
国家2 1991 1
国家2 1992 0
国家2 1995 1
国家2 1996 0
国家2 2000 1

并想要创建以下变量peace:

国家 冲突 和平
国家1 1990 1 0
国家1 1991 0 0
国家1 1992 0 1
国家1 1993 0 2
国家1 1994 1 3
国家1 1995 1 0
国家1 1996 0 0
国家1 1997 0 1
国家1 1998 0 2
国家1 1999 0 3
国家1 2014 0 15
国家2 1990 0 0
国家2 1991 1 1
国家2 1992 0 0
国家2 1995 1 3
国家2 1996 0 0
国家2 2000 1 4

peace
表示两次冲突之间的年份(更准确地说是从冲突年份之后到冲突年份)。这是因为冲突年份后和平值变为 0。

需要注意的是,缺失的观察年份也被计算在内。

r dataframe dplyr
3个回答
4
投票

我认为你的15应该是18,但否则......

使用您的“预期”作为起点(用于并排比较):

quux |>
  mutate(
    last_conflict = lag(if_else(conflict == 1, year, NA), default = min(year) - 1),
    .by = country) |>
  tidyr::fill(last_conflict, .direction = "down") |>
  mutate(peace2 = year - last_conflict - 1)
#     country year conflict peace last_conflict peace2
# 1  country1 1990        1     0          1989      0
# 2  country1 1991        0     0          1990      0
# 3  country1 1992        0     1          1990      1
# 4  country1 1993        0     2          1990      2
# 5  country1 1994        1     3          1990      3
# 6  country1 1995        1     0          1994      0
# 7  country1 1996        0     0          1995      0
# 8  country1 1997        0     1          1995      1
# 9  country1 1998        0     2          1995      2
# 10 country1 1999        0     3          1995      3
# 11 country1 2014        0    15          1995     18
# 12 country2 1990        0     0          1989      0
# 13 country2 1991        1     1          1989      1
# 14 country2 1992        0     0          1991      0
# 15 country2 1995        1     3          1991      3
# 16 country2 1996        0     0          1995      0
# 17 country2 2000        1     4          1995      4

数据

quux <- structure(list(country = c("country1", "country1", "country1", "country1", "country1", "country1", "country1", "country1", "country1", "country1", "country1", "country2", "country2", "country2", "country2", "country2", "country2"), year = c(1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2014L, 1990L, 1991L, 1992L, 1995L, 1996L, 2000L), conflict = c(1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L), peace = c(0L, 0L, 1L, 2L, 3L, 0L, 0L, 1L, 2L, 3L, 15L, 0L,  1L, 0L, 3L, 0L, 4L)), class = "data.frame", row.names = c(NA, -17L))

1
投票

此解决方案使用

dplyr::consecutive_id()
为每个和平或冲突时期创建临时分组变量:

library(dplyr)

dat %>%
  mutate(
    period = lag(consecutive_id(conflict), default = 1),
    .by = country
  ) %>%
  mutate(
    peace = year - min(year) + 1,
    .by = c(country, period)
  ) %>%
  mutate(
    peace = if_else(lag(conflict, default = 1) == 1, 0, peace),
    period = NULL
  )

结果:

# A tibble: 17 × 4
   country   year conflict peace
   <chr>    <dbl>    <dbl> <dbl>
 1 country1  1990        1     0
 2 country1  1991        0     0
 3 country1  1992        0     1
 4 country1  1993        0     2
 5 country1  1994        1     3
 6 country1  1995        1     0
 7 country1  1996        0     0
 8 country1  1997        0     1
 9 country1  1998        0     2
10 country1  1999        0     3
11 country1  2014        0    18
12 country2  1990        0     1
13 country2  1991        1     2
14 country2  1992        0     0
15 country2  1995        1     1
16 country2  1996        0     0
17 country2  2000        1     1

0
投票

我们可以使用

cumsum
lag
一起创建一个
helper
列进行分组,然后简单地找出每组中
year
frist(year)
之间的差异;

library(dplyr)

df1 %>% 
  mutate(helper = cumsum(lag(conflict, default = 1) == 1), .by = country) %>% 
  mutate(peace = year - first(year), .by = c(country, helper)) %>% 
  select(-helper)

#>     country year conflict peace
#> 1  country1 1990        1     0
#> 2  country1 1991        0     0
#> 3  country1 1992        0     1
#> 4  country1 1993        0     2
#> 5  country1 1994        1     3
#> 6  country1 1995        1     0
#> 7  country1 1996        0     0
#> 8  country1 1997        0     1
#> 9  country1 1998        0     2
#> 10 country1 1999        0     3
#> 11 country1 2014        0    18
#> 12 country2 1990        0     0
#> 13 country2 1991        1     1
#> 14 country2 1992        0     0
#> 15 country2 1995        1     3
#> 16 country2 1996        0     0
#> 17 country2 2000        1     4

创建于 2024-02-22,使用 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.