我有一个数据帧,每个数据点的结构都像:ID,度量,时间标记
ID measure timemark
001 12 15
003 3 13
004 365 0
003 1 13
ID是一个人的唯一学习ID,而度数是该人当时使用服务的天数,时间戳是从0到51的数字范围,表示一年中的52周x
现在,我想创建一个由52列组成的数据框,其中的每一个都由他们在该周中在服务中花费的天数组成(因此,最大天数应为每周7天)。对于每个人,他们在一个时间点可以有多个条目。从这个意义上讲,总天数应该是两行的总和。
所以我想让它像:
ID ... week13 week14 week15 week 16
001 ... 0 0 7 5
003 ... 4 0 0 0
004 ... 7 7 7 7
我在内部逻辑中苦苦挣扎,并猜想它与商和余量有关,但无法解决……有人可以帮忙吗?
这是tidyverse
命令的一种方法:
library(dplyr)
library(tidyr)
df %>%
group_by(ID, timemark) %>%
summarise(measure = sum(measure)) %>%
mutate(measure = list(c(rep(7, floor(measure/7)), measure %% 7))) %>%
unnest_longer(measure) %>%
mutate(timemark = paste0('week', first(timemark) + 0:(n() - 1))) %>%
slice(1:pmin(n(), 52)) %>%
mutate(timemark = factor(timemark, levels = paste0('week', 0:51))) %>%
spread(timemark, measure)
#Or using pivot_wider in new tidyr
#pivot_wider(names_from = timemark, values_from = measure)
# A tibble: 3 x 53
# Groups: ID [3]
# ID week0 week1 week2 week3 week4 week5 week6 week7 week8 week9 week10 week11 week12 week13 week14 week15 week16
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 7 5
#2 3 NA NA NA NA NA NA NA NA NA NA NA NA NA 4 NA NA NA
#3 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
# … with 35 more variables: week17 <dbl>, week18 <dbl>, week19 <dbl>, week20 <dbl>, week21 <dbl>, week22 <dbl>,
# week23 <dbl>, week24 <dbl>, week25 <dbl>, week26 <dbl>, week27 <dbl>, week28 <dbl>, week29 <dbl>, week30 <dbl>,
# week31 <dbl>, week32 <dbl>, week33 <dbl>, week34 <dbl>, week35 <dbl>, week36 <dbl>, week37 <dbl>, week38 <dbl>,
# week39 <dbl>, week40 <dbl>, week41 <dbl>, week42 <dbl>, week43 <dbl>, week44 <dbl>, week45 <dbl>, week46 <dbl>,
# week47 <dbl>, week48 <dbl>, week49 <dbl>, week50 <dbl>, week51 <dbl>