我有一个 R 面板数据集,其中包括每组随时间(月)的观察结果。以下数据框是完整数据框的快照:
df <- data.frame(group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),month = c("January", "January", "January", "February", "February", "February", "March", "March", "March", "January", "January", "February", "February", "March", "March"),first_value = c("A","BC","D", NA,NA,NA, "D","G","H", "K","L", NA,NA, "DE","GH"),second_value = c(1,5,7, NA,NA,NA, 2,3,9, 7,1, NA,NA, 4,4))
数据集已经按组和时间排列。 如您所见,对于给定月份中的组来说,观察值(“first_value*”* 和 *“*second_value”)可以完全为空(此处为二月,但可以是每个组除了第一个月和最后一个月之外的任何月份) 。我想要实现的是,空的月份被组内最后一个非空的上个月填充。
我想获得以下数据框:
df_filled <- data.frame(group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),month = c("January", "January", "January", "February", "February", "February", "March", "March", "March", "January", "January", "February", "February", "March", "March"),first_value = c("A","BC","D", "A","BC","D", "D","G","H", "K","L", "K","L", "DE","GH"),second_value = c(1,5,7, 1,5,7, 2,3,9, 7,1, 7,1, 4,4))
请注意,根据构造,上个月的最后一个非空月份始终具有与接下来的空月份相同的观测值数量。
我使用 dplyr 包中的 fill() 和 Zoo 包中的 na.locf () 尝试了不同的命令,但我所实现的只是填充了上个月最后一个非空的最后一行,所以
df_filled <- data.frame(group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2), month = c("January", "January", "January", "February", "February", "February", "March", "March", "March", "January", "January", "February", "February", "March", "March"), first_value = c("A","BC","D", "D","D","D", "D","G","H", "K","L", "L","L", "DE","GH"), second_value = c(1,5,7, 7,7,7, 2,3,9, 7,1, 1,1, 4,4))
期待您的建议。谢谢。
使用
row_number
的方法,假设没有连续 2 个月具有 NA
。
library(dplyr)
df %>%
mutate(n_na = sum(is.na(first_value)), .by = c(group, month)) %>%
mutate(across(ends_with("_value"), ~
if_else(is.na(.x), .x[row_number() - n_na], .x)), .by = group,
n_na = NULL)
group month first_value second_value
1 1 January A 1
2 1 January BC 5
3 1 January D 7
4 1 February A 1
5 1 February BC 5
6 1 February D 7
7 1 March D 2
8 1 March G 3
9 1 March H 9
10 2 January K 7
11 2 January L 1
12 2 February K 7
13 2 February L 1
14 2 March DE 4
15 2 March GH 4