我正在尝试使用 UCDP 战斗相关死亡数据集,称为 BattleDeaths_v22_1_conf,来自 https://ucdp.uu.se/downloads/(请参阅 UCDP 战斗相关死亡数据集版本 23.1)
我想创建一个新的变量或数据集,其中仅包含连续 2 年有 1000 人因战争死亡的国家 - 并且仅在 2008 年之后。 然而,我最终得到一个没有观察结果的变量。
我使用了数据集的“国家”变量(location_id)和战斗死亡变量(bd_best)。
到目前为止我已经在 R 中做到了这一点:
library(dplyr)
filtered_data <- subset(dput(BattleDeaths_v22_1_conf), bd_best >= 1000 & year >= 2008)
filtered_data <- filtered_data %>%
arrange(location_inc, year) %>%
group_by(location_inc) %>%
mutate(sum_deaths_two_years = lag(bd_best) + bd_best)
到目前为止一切顺利。
final_data <- filtered_data %>%
group_by(location_inc) %>%
filter(all(sum_deaths_two_years >= 2000))
现在我得到了一个具有 0 个观察值的变量。然而,我可以在原始数据集中看到,有些观察结果符合我的标准。
试试这个:
library(dplyr)
# Data ------------------------------
example_df <- tibble::tribble(
~location_inc, ~year, ~bd_best,
"Iraq", 2009L, 1036L,
"Iraq", 2010L, 989L,
"Iraq", 2011L, 864L,
"Iraq", 2012L, 565L,
"Iraq", 2013L, 1870L, # Desired
"Iraq", 2014L, 13761L, # Desired
"Iraq", 2015L, 10981L, # Desired
"Iraq", 2016L, 9775L, # Desired
"Iraq", 2017L, 10025L, # Desired
"Iraq", 2018L, 866L,
"Iraq", 2019L, 498L,
"Iraq", 2020L, 671L,
"Iraq", 2021L, 707L,
"Iraq", 2022L, 335L,
"Sudan", 2009L, 353L,
"Sudan", 2010L, 1010L, # Desired
"Sudan", 2011L, 1404L, # Desired
"Sudan", 2012L, 1173L, # Desired
"Sudan", 2013L, 594L,
"Sudan", 2014L, 856L,
"Sudan", 2015L, 1264L, # Desired
"Sudan", 2016L, 1309L, # Desired
"Sudan", 2017L, 160L,
"Sudan", 2018L, 243L,
"Sudan", 2020L, 45L,
"Sudan", 2021L, 31L,
"Sudan", 2022L, 47L)
# Code ------------------------------
example_df <- filter(
example_df,
.by = location_inc,
bd_best >= 1000,
lag(bd_best, default = -1) >= 1000 | lead(bd_best, default = -1) >= 1000)
# Outcome ---------------------------
example_df
# A tibble: 10 × 3
location_inc year bd_best
<chr> <int> <int>
1 Iraq 2013 1870
2 Iraq 2014 13761
3 Iraq 2015 10981
4 Iraq 2016 9775
5 Iraq 2017 10025
6 Sudan 2010 1010
7 Sudan 2011 1404
8 Sudan 2012 1173
9 Sudan 2015 1264
10 Sudan 2016 1309
来源:https://ucdp.uu.se/downloads/brd/ucdp-brd-dyadic-231-xlsx.zip