我正在尝试根据我的 data.table 中的概率对虚拟对象进行采样。如果我的 data.table 只有两行,这有效:
library(data.table)
playdata <- data.table(id = c("a","b"), probabilities = c(0.2, 0.3))
playdata[, sampled_dummy := sample(c(0,1),1, prob = probabilities)]
如果它有三行或更多行,则不会:
library(data.table)
playdata <- data.table(id = c("a","b","c"), probabilities = c(0.2, 0.3, 0.4))
playdata[, sampled_dummy := sample(c(0,1),1, prob = probabilities)]
Error in sample.int(length(x), size, replace, prob) :
incorrect number of probabilities
有人可以解释一下吗?我知道我可以强制逐行应用任何函数,但为什么示例会破坏标准 data.table 语法?无论如何,它不应该一行一行地做所有事情吗?
编辑:通常的解决方法会引发相同的错误:
playdata[, sampled_dummy := sample(c(0,1),1, prob = probabilities), by = seq_len(nrow(playdata))]
我认为你需要按行进行采样,所以我将用
sapply
进行演示:
set.seed(42)
playdata[, sampled_dummy := sapply(probabilities, function(prob) sample(0:1, size=1, prob=c(prob,1-prob)))]
# id probabilities sampled_dummy
# <char> <num> <int>
# 1: a 0.2 0
# 2: b 0.3 0
# 3: c 0.4 1
虽然我怀疑你使用起来可能会更容易
runif(.N) > probabilites
?
playdata[, sampled_dummy := +(runif(.N) >= probabilities)]
# id probabilities sampled_dummy
# <char> <num> <int>
# 1: a 0.2 1
# 2: b 0.3 1
# 3: c 0.4 0