如何计算最近n天的唯一行数

问题描述 投票:0回答:1

说我想要计算最近15天每天的唯一ID。这是代码:

library(tidyverse)
library(lubridate)
set.seed(1)
eg <- tibble(day = sample(seq(ymd('2018-01-01'), length.out = 100, by = 'day'), 300, replace = T),
             id = sample(letters[1:26], 300, replace = T),
             value = rnorm(300))

eg %>% 
  group_by(day) %>% 
  summarise(uniqu_id = n_distinct(id),
            recent_15_days_unique_id = 'howto',
            day_total = sum(value))

结果是

# A tibble: 95 x 4
   day        uniqu_id recent_15_days_unique_id day_total
   <date>        <int> <chr>                        <dbl>
 1 2018-01-01        3 how                         -1.38 
 2 2018-01-02        3 how                          2.01 
 3 2018-01-03        3 how                          1.57 
 4 2018-01-04        6 how                         -1.64 
 5 2018-01-05        2 how                         -0.293
 6 2018-01-06        4 how                         -2.08 

对于'recent_15_days_unique_id'列,第一行是计算“day-15”到“day”之间的唯一ID,即'2017-12-17'和'2018-01-01',第二行是'2017- 12-18'和'2018-01-02'。它类似'rollum'功能,但用于计数。

r tidyverse
1个回答
1
投票

我们可以ungroup和每天,我们可以创建一个15天的序列,并计算在该持续时间内所有独特的ids。

library(dplyr)

eg %>% 
   group_by(day) %>% 
   summarise(uniqu_id = n_distinct(id),
             day_total = sum(value)) %>%
   ungroup() %>%
   rowwise() %>%
   mutate(recent_15_days_unique_id = 
    n_distinct(eg$id[eg$day %in% seq(day - 15, day, by = "1 day")]))



 #   day        uniqu_id day_total recent_15_days_unique_id
 # <date>        <int>     <dbl>                    <int>
 #1 2018-01-02        2    0.170                         2
 #2 2018-01-03        2   -0.460                         3
 #3 2018-01-04        1   -1.53                          3
 #4 2018-01-05        2    1.67                          5
 #5 2018-01-06        2    1.52                          6
 #6 2018-01-07        4   -1.62                         10
 #7 2018-01-08        2   -0.0190                       12
 #8 2018-01-09        1   -0.573                        12
 #9 2018-01-10        2   -0.220                        13
#10 2018-01-11        7   -1.73                         14

使用相同的逻辑,我们也可以使用sapply单独计算它

new_eg <- eg %>% 
         group_by(day) %>% 
         summarise(uniqu_id = n_distinct(id),
                   day_total = sum(value)) %>%
         ungroup()


sapply(new_eg$day, function(x) 
   n_distinct(eg$id[as.numeric(eg$day) %in% seq(x-15, x, by = "1 day")]))

#[1]  2  3  3  5  6 10 12 12 13 14 15 16 17 17 18 20 21 22 22 20 20 21 21 .....
© www.soinside.com 2019 - 2024. All rights reserved.