按组计算列中 1 出现的间隔年数

Question

我有以下数据框：

国家	年	冲突
国家1	1990	1
国家1	1991	0
国家1	1992	0
国家1	1993	0
国家1	1994	1
国家1	1995	1
国家1	1996	0
国家1	1997	0
国家1	1998	0
国家1	1999	0
国家1	2014	0
国家2	1990	0
国家2	1991	1
国家2	1992	0
国家2	1995	1
国家2	1996	0
国家2	2000	1

并想要创建以下变量peace：

国家	年	冲突	和平
国家1	1990	1	0
国家1	1991	0	0
国家1	1992	0	1
国家1	1993	0	2
国家1	1994	1	3
国家1	1995	1	0
国家1	1996	0	0
国家1	1997	0	1
国家1	1998	0	2
国家1	1999	0	3
国家1	2014	0	15
国家2	1990	0	0
国家2	1991	1	1
国家2	1992	0	0
国家2	1995	1	3
国家2	1996	0	0
国家2	2000	1	4

peace

表示两次冲突之间的年份（更准确地说是从冲突年份之后到冲突年份）。这是因为冲突年份后和平值变为 0。

需要注意的是，缺失的观察年份也被计算在内。

Answer 1

我认为你的15应该是18，但否则......

使用您的“预期”作为起点（用于并排比较）：

quux |>
  mutate(
    last_conflict = lag(if_else(conflict == 1, year, NA), default = min(year) - 1),
    .by = country) |>
  tidyr::fill(last_conflict, .direction = "down") |>
  mutate(peace2 = year - last_conflict - 1)
#     country year conflict peace last_conflict peace2
# 1  country1 1990        1     0          1989      0
# 2  country1 1991        0     0          1990      0
# 3  country1 1992        0     1          1990      1
# 4  country1 1993        0     2          1990      2
# 5  country1 1994        1     3          1990      3
# 6  country1 1995        1     0          1994      0
# 7  country1 1996        0     0          1995      0
# 8  country1 1997        0     1          1995      1
# 9  country1 1998        0     2          1995      2
# 10 country1 1999        0     3          1995      3
# 11 country1 2014        0    15          1995     18
# 12 country2 1990        0     0          1989      0
# 13 country2 1991        1     1          1989      1
# 14 country2 1992        0     0          1991      0
# 15 country2 1995        1     3          1991      3
# 16 country2 1996        0     0          1995      0
# 17 country2 2000        1     4          1995      4

数据

quux <- structure(list(country = c("country1", "country1", "country1", "country1", "country1", "country1", "country1", "country1", "country1", "country1", "country1", "country2", "country2", "country2", "country2", "country2", "country2"), year = c(1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2014L, 1990L, 1991L, 1992L, 1995L, 1996L, 2000L), conflict = c(1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L), peace = c(0L, 0L, 1L, 2L, 3L, 0L, 0L, 1L, 2L, 3L, 15L, 0L,  1L, 0L, 3L, 0L, 4L)), class = "data.frame", row.names = c(NA, -17L))

Answer 2

"

我们可以使用

cumsum

和

lag

一起创建一个

helper

列进行分组，然后简单地找出每组中

year

和

frist(year)

之间的� 755f �异；

\n"

library(dplyr)

df1 %>% 
  mutate(helper = cumsum(lag(conflict, default = 1) == 1), .by = country) %>% 
  mutate(peace = year - first(year), .by = c(country, helper)) %>% 
  select(-helper)

#>     country year conflict peace
#> 1  country1 1990        1     0
#> 2  country1 1991        0     0
#> 3  country1 1992        0     1
#> 4  country1 1993        0     2
#> 5  country1 1994        1     3
#> 6  country1 1995        1     0
#> 7  country1 1996        0     0
#> 8  country1 1997        0     1
#> 9  country1 1998        0     2
#> 10 country1 1999        0     3
#> 11 country1 2014        0    18
#> 12 country2 1990        0     0
#> 13 country2 1991        1     1
#> 14 country2 1992        0     0
#> 15 country2 1995        1     3
#> 16 country2 1996        0     0
#> 17 country2 2000        1     4

^{创建于 2024-02-22，使用 reprex v2.0.2}

Answer 3

@r2evans

谢谢您的回答。这已经运行得很好，但是有一些小错误。

我的代码后的输出如下：

号码。	国家	年份	冲突	和平
1	伊拉克	1985	0	-34
2	伊拉克	1989	0	-30
3	伊拉克	1990	0	-29
4	伊拉克	1991	1	-1
5	伊拉克	1992	0	0
6	伊拉克	1993	0	1
7	伊拉克	1994	0	2
8	伊拉克	1995	1	-1
9	伊拉克	1996	0	0
10	伊拉克	1997	0	1
11	伊拉克	1998	0	2
12	伊拉克	1999	0	3
13	伊拉克	2000	0	4
14	伊拉克	2001	0	5
15	伊拉克	2002	0	6
16	伊拉克	2010	0	14
17	伊拉克	2011	0	15
18	伊拉克	2012	0	16
19	伊拉克	2013	0	17
20	伊拉克	2014	0	18

按组计算列中 1 出现的间隔年数

问题描述投票：0回答：3

3个回答

最新问题

按组计算列中 1 出现的间隔年数

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3