我有这个数据集:
df <- data.frame(Date = c("12-01-2019","12-01-2019","12-02-2019","12-02-2019","12-02-2019","12-03-2019"),
Country = c("France","USA","France","USA","Colombia","USA")).
我想在dplyr上应用cumsum并得到以下结果:
Date Country cumsum
"12-01-2019" "France" 1
"12-01-2019" "USA" 1
"12-01-2019" "Colombia" 0
"12-02-2019" "France" 2
"12-02-2019" "USA" 2
"12-02-2019" "Colombia" 1
"12-03-2019" "France" 2
"12-03-2019" "USA" 3
"12-03-2019" "Colombia" 1
有任何建议吗?
非常感谢您的帮助。
问候!
我们可以为每个count
和Date
组合设置Country
行数,每个complete
的Country
缺失日期并将计数加为0。最后,对于每个Country
,我们可以将cumsum
]。
library(dplyr)
df %>%
mutate(Date = lubridate::mdy(Date)) %>%
count(Date, Country) %>%
tidyr::complete(Country, Date = seq(min(Date), max(Date), by = 'day'),
fill = list(n = 0)) %>%
group_by(Country) %>%
mutate(n = cumsum(n))
# Country Date n
# <chr> <date> <dbl>
#1 Colombia 2019-12-01 0
#2 Colombia 2019-12-02 1
#3 Colombia 2019-12-03 1
#4 France 2019-12-01 1
#5 France 2019-12-02 2
#6 France 2019-12-03 2
#7 USA 2019-12-01 1
#8 USA 2019-12-02 2
#9 USA 2019-12-03 3