我想在数据框中以不同的条件在不同的组中添加一个新变量。我的数据是这样的:
test <- data.frame(country =rep( letters[1:5], each = 10),
time = seq(from = as.Date('2020-01-01'), to = as.Date('2020-02-19'), by = 'day')) %>% mutate(time = as.Date(time))
lockdown_time <- data.frame(country = letters[1:4],
start_time = c('2020-01-06', '2020-01-16', '2020-01-26', '2020-02-05'),
end_time = c('2020-01-08','2020-01-18','2020-01-28','2020-02-07'))
我将以country == 'a'
为例:
# use country a as an example
test_a <- test %>% filter(country == 'a')
start_time_a <- lockdown_time[1,2] %>% as.Date()
end_time_a <- lockdown_time[1,3] %>% as.Date()
test_a %>% mutate(lockdown = case_when(between(time, start_time_a, end_time_a) ~ 1, T ~ 0))
我知道如何在每个国家/地区中逐一添加新变量lockdown
,但我想知道是否有一种有效的方法来做到这一点。请注意,country == 'e'
数据帧中没有lockdown_time
,因此在lockdown
中创建的country == 'e'
变量应全部为NA
。
您需要这样的left_join
:
test %>%
left_join(lockdown_time, by="country") %>%
mutate(
start_time = as.Date(start_time),
end_time = as.Date(end_time)
) %>%
mutate(
lockdown = case_when(between(time, start_time_a, end_time_a) ~ 1, T ~ 0))
您会得到
country time start_time end_time lockdown
1 a 2020-01-01 2020-01-06 2020-01-08 0
2 a 2020-01-02 2020-01-06 2020-01-08 0
3 a 2020-01-03 2020-01-06 2020-01-08 0
4 a 2020-01-04 2020-01-06 2020-01-08 0
5 a 2020-01-05 2020-01-06 2020-01-08 0
6 a 2020-01-06 2020-01-06 2020-01-08 1
7 a 2020-01-07 2020-01-06 2020-01-08 1
...
47 e 2020-02-16 <NA> <NA> 0
48 e 2020-02-17 <NA> <NA> 0
49 e 2020-02-18 <NA> <NA> 0
50 e 2020-02-19 <NA> <NA> 0
您将获得警告,为避免这种情况,请在,stringsAsFactors = F
创建中放入“ data.frames
”。