我有一个名为loc_prime2的数据集,看起来像这样:
Document.Name locale Arrival Leg.Number no_legs
VCH028735 DENVER_COLORADO 12/2/2018 1 2
VCH028735 _NONE 12/7/2018 2 2
VCH028776 HARLINGEN_TEXAS 12/2/2018 1 3
VCH028776 LUBBOCK_TEXAS 12/3/2018 2 3
VCH028776 NONE 12/4/2018 3 3
VCH030440 MEMPHIS_TENNESSEE 5/12/2019 1 6
VCH030440 NASHVILLE_TENNESSEE 5/13/2019 2 6
VCH030440 KNOXVILLE_TENNESSEE 5/14/2019 3 6
VCH030440 CHATTANOOGA_TENNESSEE 5/15/2019 4 6
VCH030440 NASHVILLE_TENNESSEE 5/16/2019 5 6
VCH030440 Kennesaw, 5/18/2019 6 6
VCH031580 EUGENE_OREGON 7/8/2019 1 8
VCH031580 NEWPORT_OREGON 7/9/2019 2 8
VCH031580 CORVALLIS_OREGON 7/10/2019 3 8
VCH031580 EUGENE_OREGON 7/11/2019 4 8
VCH031580 EUREKA_CALIFORNIA 7/12/2019 5 8
VCH031580 REDDING_CALIFORNIA 7/15/2019 6 8
VCH031580 SACRAMENTO_CALIFORNIA 7/16/2019 7 8
VCH031580 _NONE 7/17/2019 8 8
我想添加一个新列,该列包含当前到达日期之后的到达日期。根据行程中的no_legs,此操作需要执行不同的次数。例如,第一个Document.Name位于12/2的丹佛;与Document.Name关联的下一个位置是_None,表示在丹佛之后没有目的地。因此,VCH028735的行应压缩为:
Document.Name locale Arrival End
VCH028735 DENVER_COLORADO 12/2/2018 12/7/2018
请注意,某些行程有两条以上的腿。行程多达8条腿。例如,VCH031580需要压缩为:
Document.Name locale Arrival End
VCH031580 EUGENE_OREGON 7/8/2019 7/9/2019
VCH031580 NEWPORT_OREGON 7/9/2019 7/10/2019
VCH031580 CORVALLIS_OREGON 7/10/2019 7/11/2019
VCH031580 EUGENE_OREGON 7/11/2019 7/12/2019
VCH031580 EUREKA_CALIFORNIA 7/12/2019 7/15/2019
VCH031580 REDDING_CALIFORNIA 7/15/2019 7/16/2019
VCH031580 SACRAMENTO_CALIFORNIA 7/16/2019 7/17/2019
我的no_legs为2的情况是这样的]
test <- as.data.frame(loc_prime2 %>% group_by(Document.Name) %>% mutate(
end1 = as.Date(ifelse(Leg.Number == 1 & no_legs == 2, lead(Arrival), 0),
origin = '1970-01-01')
# end mutate
)
)
但是要处理不同的no_legs值,我想我需要一个循环之类的东西。我敢肯定,有一种很简单的方法可以做我想做的事-我只是看不到它。有想法吗?
提前感谢。
[我认为您通过考虑每组的腿数来增加难度。假设您的到达日期是按时间顺序排序的,那么您所需要做的就是按Document.Name
分组,然后使用lead
创建新的end
变量。然后,您只需删除所有最后的行(对于NA
,将有一个end
)
library(dplyr)
loc_prime2 %>%
group_by(Document.Name) %>%
mutate(End = lead(Arrival)) %>%
select(Document.Name, locale, Arrival, End, Leg.Number) %>%
filter(!is.na(End))
#> # A tibble: 15 x 5
#> # Groups: Document.Name [4]
#> Document.Name locale Arrival End Leg.Number
#> <chr> <chr> <chr> <chr> <int>
#> 1 VCH028735 DENVER_COLORADO 12/2/2018 12/7/2018 1
#> 2 VCH028776 HARLINGEN_TEXAS 12/2/2018 12/3/2018 1
#> 3 VCH028776 LUBBOCK_TEXAS 12/3/2018 12/4/2018 2
#> 4 VCH030440 MEMPHIS_TENNESSEE 5/12/2019 5/13/2019 1
#> 5 VCH030440 NASHVILLE_TENNESSEE 5/13/2019 5/14/2019 2
#> 6 VCH030440 KNOXVILLE_TENNESSEE 5/14/2019 5/15/2019 3
#> 7 VCH030440 CHATTANOOGA_TENNESSEE 5/15/2019 5/16/2019 4
#> 8 VCH030440 NASHVILLE_TENNESSEE 5/16/2019 5/18/2019 5
#> 9 VCH031580 EUGENE_OREGON 7/8/2019 7/9/2019 1
#> 10 VCH031580 NEWPORT_OREGON 7/9/2019 7/10/2019 2
#> 11 VCH031580 CORVALLIS_OREGON 7/10/2019 7/11/2019 3
#> 12 VCH031580 EUGENE_OREGON 7/11/2019 7/12/2019 4
#> 13 VCH031580 EUREKA_CALIFORNIA 7/12/2019 7/15/2019 5
#> 14 VCH031580 REDDING_CALIFORNIA 7/15/2019 7/16/2019 6
#> 15 VCH031580 SACRAMENTO_CALIFORNIA 7/16/2019 7/17/2019 7