我试图通过从特定的离散分布中采样来生成虚拟数据 - 条件是因子的级别(因此每个因子级别的分布不同),然后希望将每个随机结果插入到对应的行中的新数据帧列中到因子水平。如果您运行下面的代码,您将看到'data $ last'为空。我不确定我做错了什么,我已经尝试过没有循环,通过将每个级别的复制设置为100 - 但是分布不正确。
#Create data frame with factor
set.seed(1)
ID<-(1:200)
gender<-sample(x = c("Male","Female"), 200, replace = T, prob = c(0.5, 0.5))
data<-data.frame(ID,gender)
#Generate random response based on discrete distribution conditional on gender
data$last <- for (i in 1:nrow(data)) {if(data$gender=="Male") {
sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.8, 0.2))
} else {
sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.3, 0.7))
}
}
您应该重写for循环以在循环内分配每个数据$ last值:
for (i in 1:nrow(data)) {
if(data$gender[i]=="Male") {
data$last[i] = sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.8, 0.2))
} else {
data$last[i] = sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.3, 0.7))
}
}
或者没有for循环:
data$last = ifelse(data$gender=="Male",
sample(x = c("Today","Yesterday"), length(data$gender[(data$gender=="Male")==TRUE]), replace = T, prob = c(0.8, 0.2)),
sample(x = c("Today","Yesterday"), length(data$gender[(data$gender!="Male")==TRUE]), replace = T, prob = c(0.3, 0.7)))
#Generate random response based on discrete distribution conditional on gender
data$last <- sapply(1:nrow(data),function(i){if(data$gender[i]=="Male") {
s =sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.8, 0.2))
} else {
s = sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.3, 0.7))
}
return(s)
})
检查你是如何寻找特定的data$gender
而不是整个矢量。另外,使用return(s)
返回结果