我的数据框如下所示:
df <- structure(list(country = c("Slovenia", "Slovenia", "Slovenia", "Slovenia", "Slovenia", "Slovenia", "Hungary", "Hungary", "Hungary", "Hungary", "Hungary", "Hungary"), year = c(2000, 2001,
2002, 2003, 2004, 2005, 2000, 2001, 2002, 2003, 2004, 2005), gov_id = c(NA, 1, NA, NA, 2, NA, NA, 12, NA, NA, 13, NA)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
我想根据每组(国家)最后一次非 NA 观察来填补数据空白,直到每组最后一次非 NA 观察。因此,行“1”、“6”、“7”和“12”中的gov_id的值应保持为“NA”。因此,数据框最终应如下所示:
df_new <- structure(list(country = c("Slovenia", "Slovenia", "Slovenia", "Slovenia", "Slovenia", "Slovenia", "Hungary", "Hungary", "Hungary", "Hungary", "Hungary", "Hungary"), year = c(2000, 2001,
2002, 2003, 2004, 2005, 2000, 2001, 2002, 2003, 2004, 2005), gov_id = c(NA, 1, 1, 1, 2, NA, NA, 12, 12, 12, 13, NA)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
我已经尝试过了
library(tidyr)
fill(gov_id, direction = "up")
和类似的选项,但问题是它会覆盖组的开头和/或末尾的 NA,而应该保留 NA。
我感谢任何在更大范围内也有效的建议!
我认为您需要在执行
country
之前按 fill
对数据框进行“分组”(并注意方向应该是“向下”而不是“向上”,如您的 df_new
中所示)。对 fill
方法的快速修复是将每组的最后一行变成 NA
。
library(tidyverse)
df %>%
group_by(country) %>%
fill(gov_id, .direction = "down") %>%
mutate(gov_id = ifelse(row_number() == n(), NA, gov_id)) %>%
ungroup()
# A tibble: 12 × 3
country year gov_id
<chr> <dbl> <dbl>
1 Slovenia 2000 NA
2 Slovenia 2001 1
3 Slovenia 2002 1
4 Slovenia 2003 1
5 Slovenia 2004 2
6 Slovenia 2005 NA
7 Hungary 2000 NA
8 Hungary 2001 12
9 Hungary 2002 12
10 Hungary 2003 12
11 Hungary 2004 13
12 Hungary 2005 NA