data.table 中具有行特定概率的样本

问题描述 投票:0回答:1

我正在尝试根据我的 data.table 中的概率对虚拟对象进行采样。如果我的 data.table 只有两行,这有效:

library(data.table)
playdata <- data.table(id = c("a","b"), probabilities = c(0.2, 0.3))
playdata[, sampled_dummy := sample(c(0,1),1, prob = probabilities)]

如果它有三行或更多行,则不会:

library(data.table)
playdata <- data.table(id = c("a","b","c"), probabilities = c(0.2, 0.3, 0.4))
playdata[, sampled_dummy := sample(c(0,1),1, prob = probabilities)]

Error in sample.int(length(x), size, replace, prob) : 
  incorrect number of probabilities

有人可以解释一下吗?我知道我可以强制逐行应用任何函数,但为什么示例会破坏标准 data.table 语法?无论如何,它不应该一行一行地做所有事情吗?

编辑:通常的解决方法会引发相同的错误:

playdata[, sampled_dummy := sample(c(0,1),1, prob = probabilities), by = seq_len(nrow(playdata))]
r data.table sample
1个回答
1
投票

我认为你需要按行进行采样,所以我将用

sapply
进行演示:

set.seed(42)
playdata[, sampled_dummy := sapply(probabilities, function(prob) sample(0:1, size=1, prob=c(prob,1-prob)))]
#        id probabilities sampled_dummy
#    <char>         <num>         <int>
# 1:      a           0.2             0
# 2:      b           0.3             0
# 3:      c           0.4             1

虽然我怀疑你使用起来可能会更容易

runif(.N) > probabilites

playdata[, sampled_dummy := +(runif(.N) >= probabilities)]
#        id probabilities sampled_dummy
#    <char>         <num>         <int>
# 1:      a           0.2             1
# 2:      b           0.3             1
# 3:      c           0.4             0
© www.soinside.com 2019 - 2024. All rights reserved.