在R中回填之前的日期

问题描述 投票:0回答:2

我们来看一个简单的数据帧

structure(list(a = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("a", 
"b"), class = "factor"), dt = structure(c(NA, 17287, 17318, NA, 
17379, 17410), class = "Date")), .Names = c("a", "dt"), row.names = c(NA, 
-6L), class = "data.frame")

给出以下内容

  a         dt
1 a       <NA>
2 a 2017-05-01
3 a 2017-06-01
4 b       <NA>
5 b 2017-08-01
6 b 2017-09-01

在我的实际数据中,我多次发生这种情况。如何回填上个月的开始日期。

理想情况下,我想使用dplyr这样做。我能得到的最接近的是使用lubridate::floor_datedplyr::lead导致最后一个日期成为NA

tmp %>%
  group_by(a) %>%
  mutate(dt = floor_date(lead(dt, 1) - 1, "month"))

# A tibble: 6 x 2
# Groups:   a [2]
  a     dt        
  <fct> <date>    
1 a     2017-04-01
2 a     2017-05-01
3 a     NA        
4 b     2017-07-01
5 b     2017-08-01
6 b     NA 

我们将不胜感激。

r
2个回答
0
投票

你真的非常接近答案。除了lubridate之外,你只需要包dplyr

tmp <- structure(list(a = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("a", "b"), class = "factor"), 
                      dt = structure(c(NA, 17287, 17318, NA, 17379, 17410), class = "Date")),
                 .Names = c("a", "dt"), 
                 row.names = c(NA, -6L), 
                 class = "data.frame")

library(lubridate)
library(dplyr)

tmp %>%
  group_by(a) %>%
  mutate(newDT = if_else(is.na(dt), lead(dt) %m-% months(1), dt))
tmp

# A tibble: 6 x 3
# Groups:   a [2]
  a     dt         newDT     
  <fct> <date>     <date>    
1 a     NA         2017-04-01
2 a     2017-05-01 2017-05-01
3 a     2017-06-01 2017-06-01
4 b     NA         2017-07-01
5 b     2017-08-01 2017-08-01
6 b     2017-09-01 2017-09-01

我不擅长在R中使用Excel风格的日期,但我认为一旦你到达这里,你就可以将newDT转换成你想要的格式。 (编辑:感谢@phiver纠正我的代码!)


0
投票

我认为如果NA有超过1个相邻的dt值,那么目前接受的解决方案将不起作用。

这是另一种选择,注意顺序很重要:

solution

dat

  a         dt
1 a       <NA>
2 a       <NA>
3 a 2017-05-01
4 a 2017-06-01
5 b       <NA>
6 b 2017-08-01
7 b 2017-09-01

library(dplyr)
library(tidyr)

dat %>%
  group_by(a) %>%
  mutate(helper = ifelse(is.na(dt), NA, cumsum(!is.na(dt)))) %>%
  fill(helper, .direction = 'up') %>%
  group_by(a, helper) %>%
  mutate(dt = coalesce(dt,
                       max(dt, na.rm = TRUE) - months(max(row_number()) - row_number()))) %>%
  dplyr::select(-helper)

# A tibble: 7 x 3
# Groups:   a, helper [4]
  helper a     dt        
   <int> <fct> <date>    
1      1 a     2017-03-01
2      1 a     2017-04-01
3      1 a     2017-05-01
4      2 a     2017-06-01
5      1 b     2017-07-01
6      1 b     2017-08-01
7      2 b     2017-09-01

data

dat <-structure(list(a = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("a", 
"b"), class = "factor"), dt = structure(c(NA, NA, 17287, 17318, 
NA, 17379, 17410), class = "Date")), .Names = c("a", "dt"), row.names = c(NA, 
-7L), class = "data.frame")
© www.soinside.com 2019 - 2024. All rights reserved.