如何使用dplyr在r中的不同条件的不同组中创建新变量

问题描述 投票:0回答:1

我想在数据框中以不同的条件在不同的组中添加一个新变量。我的数据是这样的:

test <- data.frame(country =rep( letters[1:5], each = 10),
                   time = seq(from = as.Date('2020-01-01'), to = as.Date('2020-02-19'), by = 'day')) %>% mutate(time = as.Date(time))

lockdown_time <- data.frame(country = letters[1:4],
                            start_time = c('2020-01-06', '2020-01-16', '2020-01-26', '2020-02-05'),
                            end_time = c('2020-01-08','2020-01-18','2020-01-28','2020-02-07')) 

我将以country == 'a'为例:

# use country a as an example 

test_a <- test  %>%  filter(country == 'a')

start_time_a <- lockdown_time[1,2] %>% as.Date()

end_time_a <- lockdown_time[1,3] %>% as.Date()


test_a %>% mutate(lockdown = case_when(between(time, start_time_a, end_time_a) ~ 1, T ~ 0))

我知道如何在每个国家/地区中逐一添加新变量lockdown,但我想知道是否有一种有效的方法来做到这一点。请注意,country == 'e'数据帧中没有lockdown_time,因此在lockdown中创建的country == 'e'变量应全部为NA

r dplyr tidyverse purrr
1个回答
0
投票

您需要这样的left_join

test %>% 
  left_join(lockdown_time, by="country") %>% 
  mutate(
    start_time = as.Date(start_time),
    end_time   = as.Date(end_time)
  ) %>% 
  mutate(
    lockdown = case_when(between(time, start_time_a, end_time_a) ~ 1, T ~ 0))

您会得到

   country       time start_time   end_time lockdown
1        a 2020-01-01 2020-01-06 2020-01-08        0
2        a 2020-01-02 2020-01-06 2020-01-08        0
3        a 2020-01-03 2020-01-06 2020-01-08        0
4        a 2020-01-04 2020-01-06 2020-01-08        0
5        a 2020-01-05 2020-01-06 2020-01-08        0
6        a 2020-01-06 2020-01-06 2020-01-08        1
7        a 2020-01-07 2020-01-06 2020-01-08        1
...
47       e 2020-02-16       <NA>       <NA>        0
48       e 2020-02-17       <NA>       <NA>        0
49       e 2020-02-18       <NA>       <NA>        0
50       e 2020-02-19       <NA>       <NA>        0

您将获得警告,为避免这种情况,请在,stringsAsFactors = F创建中放入“ data.frames”。

© www.soinside.com 2019 - 2024. All rights reserved.