如何通过自定义概率分布在Tensorflow中进行采样?

问题描述 投票:2回答:2

我有一个矢量,例如,N个元素的V = [10, 30, 20, 50]和概率矢量P = [.2, .3, .1, .4]。在张量流中,如何从V中随机采样符合给定概率分布P的K个元素?我希望通过更换来完成采样。

tensorflow sampling probability-distribution
2个回答
2
投票

tf.nn.fixed_unigram_candidate_sampler或多或少地做你想要的。问题是,它只能将int32参数作为unigrams参数(概率分布),因为它是为高数字多类处理而设计的,例如语言处理。您可以将概率分布中的数字乘以得到一个整数,但仅限于精度限制。

将所需数量的样本放入num_samples并将概率权重放入unigrams(必须为int32。)参数true_classes必须填充与num_true相同数量的元素,但不相关,因为您将获得索引(和然后使用它们来拉取样本。)unique可以根据需要更改为True。

这是经过测试的代码:

import tensorflow as tf
import numpy as np
sess = tf.Session()

V = tf.constant( np.array( [[ 10, 30, 20, 50 ]]), dtype=tf.int64)

sampled_ids, true_expected_count, sampled_expected_count = tf.nn.fixed_unigram_candidate_sampler(
   true_classes = V,
   num_true = 4,
   num_sampled = 50,
   unique = False,
   range_max = 4,
   unigrams = [ 20, 30, 10, 40 ] # this is P, times 100
)
sample = tf.gather( V[ 0 ], sampled_ids )
x = sess.run( sample )
print( x )

输出:

[50 20 10 30 30 30 10 30 20 50 50 50 10 50 10 30 50 50 30 30 50 10 20 30 50 50 50 50 30 50 50 30 50 50 50 50 50 50 50 10 50 30 50 10 50 50 10 30 50 50]

如果你真的想使用float32概率值,那么你必须从几个部分创建采样器(对此没有任何操作),像这样(测试代码):

import tensorflow as tf
import numpy as np
sess = tf.Session()

k = 50 # number of samples you want
V = tf.constant( [ 10, 30, 20, 50 ], dtype = tf.float32 ) # values
P = tf.constant( [ 0.2, 0.3, 0.1, 0.4 ], dtype = tf.float32 ) # prob dist

cum_dist = tf.cumsum( P ) # create cumulative probability distribution

# get random values between 0 and the max of cum_dist
# we'll determine where it is in the cumulative distribution
rand_unif = tf.random_uniform( shape=( k, ), minval = 0.0, maxval = tf.reduce_max( cum_dist ), dtype = tf.float32 )

# create boolean to signal where the random number is greater than the cum_dist
# take advantage of broadcasting to create Cartesian product
greater = tf.expand_dims( rand_unif, axis = -1 ) > tf.expand_dims( cum_dist, axis = 0 )

# we get the indices by counting how many are greater in any given row
idxs = tf.reduce_sum( tf.cast( greater, dtype = tf.int64 ), 1 )

# then just gather the sample from V by the indices
sample = tf.gather( V, idxs )

# run, output
print( sess.run( sample ) )

输出:

[20. 10. 50. 50. 20. 30. 10. 20. 30. 50. 20. 50. 30. 50. 30. 50. 50. 50. 50. 50. 50. 30. 20. 20. 20. 10. 50. 30. 30. 10. 50. 50. 50. 20. 30. 50. 30. 10. 50. 20. 30. 50. 30. 10. 10. 50. 50. 20. 50. 30.]


0
投票

tf.distributions.Categorical()可能是一个单行班的方式。根据this页面,给定PN值上定义的概率分布,tf.distributions.Categorical()可以生成具有概率0, 1, ..., N-1的整数P[0], P[1], ..., P[N-1]。生成的整数可以解释为向量V的索引。以下代码段说明了这一点:

# Probability distribution
P = [0.2, 0.3, 0.1, 0.4]

# Vector of values
V = [10, 30, 20, 50]

# Define categorical distribution
dist = tf.distributions.Categorical(probs=P)

# Generate a sample from categorical distribution - this serves as an index
index = dist.sample().eval()

# Fetch the value at V[index] as the sample
sample = V[index]

所有这些都可以在一个班轮中完成:

sample = V[tf.distributions.Categorical(probs=P).sample().eval()]

如果想要从此分布生成K样本,请将上面的一个衬里包装在列表解析中:

samples = [ V[tf.distributions.Categorical(probs=P).sample().eval()] for i in range(K) ]

输出以上代码为K = 30:

[50, 10, 30, 50, 30, 20, 50, 30, 50, 50, 30, 50, 30, 50, 20, 10, 50, 20, 30, 30, 50, 50, 50, 30, 20, 50, 30, 30, 50, 50]

可能有比使用列表理解更好的方法。

© www.soinside.com 2019 - 2024. All rights reserved.