填充数据帧中的空白,从第一个非 NA 观察开始到按组最后一个非 NA 观察结束

问题描述 投票:0回答:1

我的数据框如下所示:

df <- structure(list(country = c("Slovenia", "Slovenia", "Slovenia", "Slovenia", "Slovenia", "Slovenia", "Hungary", "Hungary", "Hungary", "Hungary", "Hungary", "Hungary"), year = c(2000, 2001, 
2002, 2003, 2004, 2005, 2000, 2001, 2002, 2003, 2004, 2005), gov_id = c(NA, 1, NA, NA, 2, NA, NA, 12, NA, NA, 13, NA)),
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))

我想根据每组(国家)最后一次非 NA 观察来填补数据空白,直到每组最后一次非 NA 观察。因此,行“1”、“6”、“7”和“12”中的gov_id的值应保持为“NA”。因此,数据框最终应如下所示:

df_new <- structure(list(country = c("Slovenia", "Slovenia", "Slovenia", "Slovenia", "Slovenia", "Slovenia", "Hungary", "Hungary", "Hungary", "Hungary", "Hungary", "Hungary"), year = c(2000, 2001, 
2002, 2003, 2004, 2005, 2000, 2001, 2002, 2003, 2004, 2005), gov_id = c(NA, 1, 1, 1, 2, NA, NA, 12, 12, 12, 13, NA)),
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))

我已经尝试过了

library(tidyr)

fill(gov_id, direction = "up")  

和类似的选项,但问题是它会覆盖组的开头和/或末尾的 NA,而应该保留 NA。

我感谢任何在更大范围内也有效的建议!

r dplyr tidyverse tidyr
1个回答
1
投票

我认为您需要在执行

country
之前按
fill
对数据框进行“分组”(并注意方向应该是“向下”而不是“向上”,如您的
df_new
中所示)。对
fill
方法的快速修复是将每组的最后一行变成
NA

library(tidyverse)

df %>% 
  group_by(country) %>% 
  fill(gov_id, .direction = "down") %>% 
  mutate(gov_id = ifelse(row_number() == n(), NA, gov_id)) %>% 
  ungroup()

# A tibble: 12 × 3
   country   year gov_id
   <chr>    <dbl>  <dbl>
 1 Slovenia  2000     NA
 2 Slovenia  2001      1
 3 Slovenia  2002      1
 4 Slovenia  2003      1
 5 Slovenia  2004      2
 6 Slovenia  2005     NA
 7 Hungary   2000     NA
 8 Hungary   2001     12
 9 Hungary   2002     12
10 Hungary   2003     12
11 Hungary   2004     13
12 Hungary   2005     NA
© www.soinside.com 2019 - 2024. All rights reserved.