从低均值的Poisson分布得出图形的性能

Question

为了从C ++中的泊松分布中抽取随机数，通常建议使用

RNG_type rng;
std::poisson_distribution<size_t> d(1e-6);
auto r = d(rng);

[每次调用std::poisson_distribution对象时，将消耗整个随机位序列（例如std::mt19937为32位，std::mt19937_64为64位）。令我惊讶的是，在如此低的平均值（mean = 1e-6）下，绝大多数时候，只有几位足以确定要返回的值为0。然后可以将其他位缓存起来以备后用。

假设设置为true的位序列与泊松分布的高返回值相关联，则当使用均值1e-6时，任何不以19 true开头的序列都必须返回零！确实，

1 - 1/2^19 < P(0, 1e-6) < 1 - 1/2^20

，其中P(n, r)表示从具有均值n的泊松分布中得出r的概率。一种不浪费位的算法将使用一半的时间，一半的时间使用两位，四分之一的时间使用三位，....

是否有一种算法可以通过绘制泊松数时消耗尽可能少的位来提高性能？当我们认为平均值较低时，与std::poisson_distribution相比，还有其他方法可以提高性能吗？

回应@ Jarod42的评论谁说

如果使用更少的位，不会破坏等概率...

我认为这不会破坏等概率。在模糊测试中，我考虑了一个具有简单bernoulli分布的相同问题。我以概率1/2^4采样为true，以概率1 - 1/2^4采样为false。一旦在缓存中看到真，函数drawWithoutWastingBits就停止，并且函数drawWastingBits消耗4位，无论这些位是什么。

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <random>

bool drawWithoutWastingBits(std::vector<bool>& cache, size_t& cache_index)
{
    /* 
        Get a true with probability 1/2^4 (=1/16=0.0625) and a false otherwise
    */

    size_t nbTrues = 0;
    while (cache[cache_index])
    {
        ++nbTrues;
        ++cache_index;
        if (nbTrues == 4)
        {
            return true;
        }
    }
    ++cache_index;
    return false;
}


bool drawWastingBits(std::vector<bool>& cache, size_t& cache_index)
{
    /* 
        Get a true with probability 1/2^4 (=1/16=0.0625) and a false otherwise
    */

    bool isAnyTrue = false;
    for (size_t i = 0 ; i < 4; ++i)
    {
        if (cache[cache_index])
        {
            isAnyTrue = true;
        }
        ++cache_index;
    }
    return !isAnyTrue;
}

int main()
{
    /*
        Just cache a lot of bits in advance in `cache`. The same sequence of bits will be used by both function.
        I am just caching way enough bits to make sure they don't run out of bits below
        I made sure to have the same number of zeros and ones so that any deviation is caused by the methodology and not by the RNG
    */

    // Produce cache
    std::vector<bool> cache;
    size_t nbBitsToCache = 1e7;
    cache.reserve(nbBitsToCache);
    for (size_t i = 0 ; i < nbBitsToCache/2 ; ++i)
    {
        cache.push_back(false);
        cache.push_back(true);
    }
    // Shuffle cache
    {
        std::mt19937 mt(std::random_device{}());
        std::shuffle(cache.begin(), cache.end(), mt);
    }


    // Draw without wasting bits
    {
        size_t nbDraws = 1e6;
        size_t cache_index = 0;
        std::pair<size_t, size_t> outcomes = {0,0};
        for (size_t r = 0 ; r < nbDraws ; ++r)
        {
            drawWithoutWastingBits(cache, cache_index) ? ++outcomes.first : ++outcomes.second;
            assert(cache_index <= cache.size());
        }   

        assert(outcomes.first + outcomes.second == nbDraws);
        std::cout << "Draw Without Wasting Bits: prob true = " << (double)outcomes.first / nbDraws << "\n";
    }


    // Draw wasting bits
    {
        size_t nbDraws = 1e6;
        size_t cache_index = 0;
        std::pair<size_t, size_t> outcomes = {0,0};
        for (size_t r = 0 ; r < nbDraws ; ++r)
        {
            drawWastingBits(cache, cache_index) ? ++outcomes.first : ++outcomes.second;
            assert(cache_index <= cache.size());
        }   

        assert(outcomes.first + outcomes.second == nbDraws);
        std::cout << "Draw Wit Wasting Bits: prob true = " << (double)outcomes.first / nbDraws << "\n";
    }
}

可能的输出

Draw Without Wasting Bits: prob true = 0.062832
Draw Wit Wasting Bits: prob true = 0.062363

Answer 1

下一次事件的时间，其中0 < U ≤ 1是统一的随机数，而λ是事件概率。

https://preshing.com/20111007/how-to-generate-random-timings-for-a-poisson-process/

Answer 2

基于该算法，如果知道mean远小于1，则如果在[0，1]中生成统一的随机数u，则当u <= exp(-mean)时泊松变量将为0，并且否则大于0。

例如，假设平均值为1e-6。使用Lumbroso的Fast Dice Roller（2013）或i中提到的算法，使用一种最小化位浪费的算法在[0, 100000)中生成随机整数Math Forum。如果i大于0，则Poisson变量正好为0。否则，它为1或更大，并且需要“慢速”算法。以下算法基于第505页中的算法，但仅对Poisson变量1或更大的样本进行采样：

METHOD Poisson1OrGreater(mean) sum=1.0-Math.exp(-lam) prod=Math.exp(-lam) u=sum+RNDRANGE(1-sum, 1) i=0 while i=0 or u>sum prod*=lam/(i+1) sum-=prod i=i+1 end return i END METHOD

此方法不像它可能的那样健壮，特别是因为它使用接近1的数字（浮点空间更稀疏）而不是接近0的数字。

从低均值的Poisson分布得出图形的性能

问题描述投票：1回答：2

2个回答

最新问题

从低均值的Poisson分布得出图形的性能

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2