Simstudy程序包重复键错误和引用的变量先前未定义错误

问题描述 投票:0回答:1

我尝试运行以下代码,并且在simstudy软件包中遇到了多个错误。

library(simstudy)
clusterDef <- defData(varname = "u_3", dist = "normal", formula = 0, 
                   variance = 25.77, id="clus") #cluster-level random effect
clusterDef <- defData(clusterDef, varname = "error", dist = "normal", formula = 0, 
                   variance = 38.35) #error termeriod 
clusterDef <- defData(clusterDef, varname = "ind", dist = "nonrandom",   
                   formula = 25) #individuals per cluster

#Generate individual-level random effect and treatment variable 
indDef <- defDataAdd(varname = "u_2", dist = "normal", formula = 0,
                     variance = 120.62)

#Generate clusters of data
set.seed(12345)

cohortsw <- genData(3, clusterDef)
cohortswTm <- addPeriods(cohortsw, nPeriods = 6, idvars = "clus", perName = "period")
cohortswTm <- trtStepWedge(cohortswTm, "clus", nWaves = 3, lenWaves = 1, startPer = 1, grpName = "trt")
cohortswTm <- genCluster(cohortswTm, cLevelVar = "clus", numIndsVar = "ind", level1ID = "id")

vecseq(f__,len__,如果(allow.cartesian || notjoin ||!anyDuplicated(f__,:连接结果为2700行;大于468 =nrow(x)+ nrow(i)。在每个i中检查重复的键值一遍又一遍地加入x中的同一组。如果可以,请尝试by = .EACHI为每个组运行j以避免大分配。如果您确定要继续,请使用allow.cartesian = TRUE重新运行。否则,请在FAQ,Wiki,堆栈溢出和data.table问题跟踪器以寻求建议。

cohortswTm <- addColumns(indDef, cohortswTm)

#Define coefficients for time as a categorical variable 
timecoeff1 <- -5.42
timecoeff2 <- -5.72
timecoeff3 <- -7.03
timecoeff4 <- -6.13
timecoeff5 <- -9.13

#Generate outcome y 
y <- defDataAdd(varname = "Y", formula = "17.87 + 5.0*trt + timecoeff1*I(period == 1) + timecoeff2*I(period == 2) + timecoeff3*I(period == 3) + timecoeff4*I(period == 4) + timecoeff5*I(period == 5) + u_3 + u_2 + error", dist = "normal")

#Add outcome to dataset
cohortswTm <- addColumns(y, cohortswTm)

错误:先前未定义引用的变量:timecoeff1,timecoeff2,timecoeff3,timecoeff4,timecoeff5

有人知道为什么我会收到上面突出显示的错误吗?我将如何修复代码以防止它们发生?

非常感谢您的帮助。

simulate
1个回答
0
投票

生成第一个错误是因为您试图在每个群集中创建单独的级别数据,但是每个群集重复出现(超过6个周期)。 genCluster期望cLevelVar是唯一ID。在这种情况下,通过将genCluster命令修改为

,可以在每个时间段的每个集群中生成6个人。
cohortswTm <- genCluster(cohortswTm, cLevelVar = "timeID", 
      numIndsVar = "ind", level1ID = "id")

此代码创建了一个“封闭”队列,仅在单个时期内观察到个体。生成一个开放的队列,其中随着时间的流逝也可能会观察到个体,这会涉及更多的工作,并且对此进行了描述here

之所以产生第二个错误,是因为模拟数据定义只能包含在数据定义的上下文中定义的变量。因此,公式中必须包含任何常量。 (如果您想探索不同协变量水平的影响,可以使用updateDef和updateDefAdd来更新公式本身。)

这是y的定义方式:

y <- defDataAdd(varname = "Y", formula = "17.87 + 5.0*trt - 
    5.42*I(period == 1) - 5.72*I(period == 2) - 7.03*I(period == 3) -
    6.13*I(period == 4) - 9.13*I(period == 5) + u_3 + u_2 + error", 
    dist = "normal")
© www.soinside.com 2019 - 2024. All rights reserved.