我尝试运行以下代码,并且在simstudy软件包中遇到了多个错误。
library(simstudy)
clusterDef <- defData(varname = "u_3", dist = "normal", formula = 0,
variance = 25.77, id="clus") #cluster-level random effect
clusterDef <- defData(clusterDef, varname = "error", dist = "normal", formula = 0,
variance = 38.35) #error termeriod
clusterDef <- defData(clusterDef, varname = "ind", dist = "nonrandom",
formula = 25) #individuals per cluster
#Generate individual-level random effect and treatment variable
indDef <- defDataAdd(varname = "u_2", dist = "normal", formula = 0,
variance = 120.62)
#Generate clusters of data
set.seed(12345)
cohortsw <- genData(3, clusterDef)
cohortswTm <- addPeriods(cohortsw, nPeriods = 6, idvars = "clus", perName = "period")
cohortswTm <- trtStepWedge(cohortswTm, "clus", nWaves = 3, lenWaves = 1, startPer = 1, grpName = "trt")
cohortswTm <- genCluster(cohortswTm, cLevelVar = "clus", numIndsVar = "ind", level1ID = "id")
vecseq(f__,len__,如果(allow.cartesian || notjoin ||!anyDuplicated(f__,:连接结果为2700行;大于468 =nrow(x)+ nrow(i)。在每个i中检查重复的键值一遍又一遍地加入x中的同一组。如果可以,请尝试by = .EACHI为每个组运行j以避免大分配。如果您确定要继续,请使用allow.cartesian = TRUE重新运行。否则,请在FAQ,Wiki,堆栈溢出和data.table问题跟踪器以寻求建议。
cohortswTm <- addColumns(indDef, cohortswTm)
#Define coefficients for time as a categorical variable
timecoeff1 <- -5.42
timecoeff2 <- -5.72
timecoeff3 <- -7.03
timecoeff4 <- -6.13
timecoeff5 <- -9.13
#Generate outcome y
y <- defDataAdd(varname = "Y", formula = "17.87 + 5.0*trt + timecoeff1*I(period == 1) + timecoeff2*I(period == 2) + timecoeff3*I(period == 3) + timecoeff4*I(period == 4) + timecoeff5*I(period == 5) + u_3 + u_2 + error", dist = "normal")
#Add outcome to dataset
cohortswTm <- addColumns(y, cohortswTm)
错误:先前未定义引用的变量:timecoeff1,timecoeff2,timecoeff3,timecoeff4,timecoeff5
有人知道为什么我会收到上面突出显示的错误吗?我将如何修复代码以防止它们发生?
非常感谢您的帮助。
生成第一个错误是因为您试图在每个群集中创建单独的级别数据,但是每个群集重复出现(超过6个周期)。 genCluster期望cLevelVar
是唯一ID。在这种情况下,通过将genCluster
命令修改为
cohortswTm <- genCluster(cohortswTm, cLevelVar = "timeID",
numIndsVar = "ind", level1ID = "id")
此代码创建了一个“封闭”队列,仅在单个时期内观察到个体。生成一个开放的队列,其中随着时间的流逝也可能会观察到个体,这会涉及更多的工作,并且对此进行了描述here。
之所以产生第二个错误,是因为模拟数据定义只能包含在数据定义的上下文中定义的变量。因此,公式中必须包含任何常量。 (如果您想探索不同协变量水平的影响,可以使用updateDef和updateDefAdd来更新公式本身。)
这是y的定义方式:
y <- defDataAdd(varname = "Y", formula = "17.87 + 5.0*trt -
5.42*I(period == 1) - 5.72*I(period == 2) - 7.03*I(period == 3) -
6.13*I(period == 4) - 9.13*I(period == 5) + u_3 + u_2 + error",
dist = "normal")