以R为置信区间,按子组计算年龄标准化率

问题描述 投票:4回答:3

我有一个看起来像这样的数据框:

df <- data.frame (
  time = rep(c("2010", "2011", "2012", "2013", "2014"),4),
  age = rep(c("40-44", "45-49", "50-54", "55-59", "60-64"),4),
 weight = rep(c(0.38, 0.23, 0.19, 0.12, 0.08),4),
 ethnic = rep(c(rep("M",5),rep("NM",5)),2),
 gender = c(rep("M",10), rep("F",10)),
 pop = round((runif(10, min = 10000, max = 99999)), digits = 0),
 count = round((runif(10, min = 100, max = 999)), digits = 0)
)
df$rate = df$count / df$pop

我想计算直接年龄的标准化发病率,其中发病率=计数/流行),以及这些的置信区间;对于每个子分组。因此,对于时间,性别,种族,年龄的每种组合,我都将获得一个标准化的费率。有没有办法在R中做到这一点?

我曾尝试使用R包{epitools}中的函数ageadjust.direct,如下所示:

age_adjust_test <- ageadjust.direct(count = df$count, pop = df$pop, 
                                rate = df$rate, stdpop = df$weight)

的输出是总体调整率,置信区间和原始率。有没有办法让每个子组都获得此输出?

r confidence-interval rate
3个回答
4
投票

我们可以将summarise分成list,然后将unnest组成list分成单独的列

library(tidyverse)
df %>% 
   group_by(time,age, ethnic, gender) %>% 
   summarise(age_adjust = list(ageadjust.direct(count = count,
            pop = pop, rate = rate, stdpop = weight))) %>%
   mutate(age_adjust = map(age_adjust, as.data.frame.list))  %>% 
   unnest
# A tibble: 20 x 8
# Groups:   time, age, ethnic [10]
#   time  age   ethnic gender crude.rate adj.rate     lci     uci
#   <fct> <fct> <fct>  <fct>       <dbl>    <dbl>   <dbl>   <dbl>
# 1 2010  40-44 M      F         0.00763  0.00763 0.00709 0.00820
# 2 2010  40-44 M      M         0.00763  0.00763 0.00709 0.00820
# 3 2010  40-44 NM     F         0.0281   0.0281  0.0257  0.0306 
# 4 2010  40-44 NM     M         0.0281   0.0281  0.0257  0.0306 
# 5 2011  45-49 M      F         0.0145   0.0145  0.0136  0.0155 
# 6 2011  45-49 M      M         0.0145   0.0145  0.0136  0.0155 
# 7 2011  45-49 NM     F         0.0425   0.0425  0.0399  0.0453 
# 8 2011  45-49 NM     M         0.0425   0.0425  0.0399  0.0453 
# 9 2012  50-54 M      F         0.0116   0.0116  0.0109  0.0124 
#10 2012  50-54 M      M         0.0116   0.0116  0.0109  0.0124 
#11 2012  50-54 NM     F         0.00708  0.00708 0.00607 0.00821
#12 2012  50-54 NM     M         0.00708  0.00708 0.00607 0.00821
#13 2013  55-59 M      F         0.0251   0.0251  0.0232  0.0271 
#14 2013  55-59 M      M         0.0251   0.0251  0.0232  0.0271 
#15 2013  55-59 NM     F         0.00733  0.00733 0.00678 0.00792
#16 2013  55-59 NM     M         0.00733  0.00733 0.00678 0.00792
#17 2014  60-64 M      F         0.0101   0.0101  0.00944 0.0109 
#18 2014  60-64 M      M         0.0101   0.0101  0.00944 0.0109 
#19 2014  60-64 NM     F         0.00916  0.00916 0.00852 0.00984
#20 2014  60-64 NM     M         0.00916  0.00916 0.00852 0.00984

2
投票

仅使用by将数据帧按一个或多个因素进行子集,然后将该子集传递到函数中。在此,by将使用docs page所示的函数值返回数据帧列表。在by外部,然后可以使用do.call(rbind,...)将所有df绑定到一个最终数据帧中。

age_adjust_test_list <- by(df, df[,c("time", "gender", "ethnicity", "age")], function(sub) {
                            tmp <- ageadjust.direct(count = sub$count, pop = sub$pop, 
                                                    rate = sub$rate, stdpop = sub$weight)

                            data.frame(time = max(sub$time),
                                       gender = max(sub$gender),
                                       ethnicity = max(sub$ethnicity),
                                       age = max(sub$age),
                                       crude_rate = tmp[[1]],
                                       adj_rate = tmp[[2]],
                                       lower_CI = tmp[[3]],
                                       upper_CI = tmp[[4]])
                           })

final_df <- do.call(rbind, age_adjust_test_list)
对于数据帧中未表示的组合,

NULL将显示。因此,请根据需要进行过滤:

age_adjust_test_list <- Filter(function(x) !is.null(x), age_adjust_test_list)

0
投票

我尝试了上面的Parfait代码,但给了我错误。有人可以分享更正的语法吗?谢谢

© www.soinside.com 2019 - 2024. All rights reserved.