Boost asio thread_pool join不需要等待任务完成。

问题描述 投票:3回答:1

考虑功能

#include <iostream>
#include <boost/bind.hpp>
#include <boost/asio.hpp>

void foo(const uint64_t begin, uint64_t *result)
{
    uint64_t prev[] = {begin, 0};
    for (uint64_t i = 0; i < 1000000000; ++i)
    {
        const auto tmp = (prev[0] + prev[1]) % 1000;
        prev[1] = prev[0];
        prev[0] = tmp;
    }
    *result = prev[0];
}

void batch(boost::asio::thread_pool &pool, const uint64_t a[])
{
    uint64_t r[] = {0, 0};
    boost::asio::post(pool, boost::bind(foo, a[0], &r[0]));
    boost::asio::post(pool, boost::bind(foo, a[1], &r[1]));

    pool.join();
    std::cerr << "foo(" << a[0] << "): " << r[0] << " foo(" << a[1] << "): " << r[1] << std::endl;
}

哪儿 foo 是一个简单的 "纯 "函数,它对以下内容进行计算 begin 并将结果写到指针上。*result.这个函数被调用时,会有不同的输入,来自 batch. 在这里,将每个调用调度到另一个CPU核可能是有益的。

现在假设批处理函数被调用了几万次。因此,一个线程池将是很好的,它可以在所有连续的批处理调用之间共享。

试着这样做(为了简单起见,只调用3次)

int main(int argn, char **)
{
    boost::asio::thread_pool pool(2);

    const uint64_t a[] = {2, 4};
    batch(pool, a);

    const uint64_t b[] = {3, 5};
    batch(pool, b);

    const uint64_t c[] = {7, 9};
    batch(pool, c);
}

结果

foo(2): 2 foo(4): 4 foo(3): 0 foo(5): 0 foo(7): 0 foo(9): 0

其中,这三条线同时出现,而计算中的 foo 需要~3s。我假设只有第一个 join 真正等待线程池完成所有的工作,其他的结果都是无效的。(未初始化的值)这里重用线程池的最佳做法是什么?

c++11 threadpool boost-asio
1个回答
1
投票

我刚刚碰到了这个高级执行器的例子,它被隐藏在文档中。

我刚刚意识到Asio自带了一个... fork_executor 例子 它正是这样做的:你可以 "分组 "任务,并参加 执行人 代表该组)而不是池。 我已经错过了很长时间,因为HTML文档中没有列出任何一个执行器的例子--。抓住 21分钟前

那么不说了,下面就是那个适用于你问题的样本。

"Live On Coliru

#define BOOST_BIND_NO_PLACEHOLDERS
#include <boost/asio/thread_pool.hpp>
#include <boost/asio/ts/executor.hpp>
#include <condition_variable>
#include <memory>
#include <mutex>
#include <queue>
#include <thread>

// A fixed-size thread pool used to implement fork/join semantics. Functions
// are scheduled using a simple FIFO queue. Implementing work stealing, or
// using a queue based on atomic operations, are left as tasks for the reader.
class fork_join_pool : public boost::asio::execution_context {
  public:
    // The constructor starts a thread pool with the specified number of
    // threads. Note that the thread_count is not a fixed limit on the pool's
    // concurrency. Additional threads may temporarily be added to the pool if
    // they join a fork_executor.
    explicit fork_join_pool(std::size_t thread_count = std::thread::hardware_concurrency()*2)
            : use_count_(1), threads_(thread_count)
    {
        try {
            // Ask each thread in the pool to dequeue and execute functions
            // until it is time to shut down, i.e. the use count is zero.
            for (thread_count_ = 0; thread_count_ < thread_count; ++thread_count_) {
                boost::asio::dispatch(threads_, [&] {
                    std::unique_lock<std::mutex> lock(mutex_);
                    while (use_count_ > 0)
                        if (!execute_next(lock))
                            condition_.wait(lock);
                });
            }
        } catch (...) {
            stop_threads();
            threads_.join();
            throw;
        }
    }

    // The destructor waits for the pool to finish executing functions.
    ~fork_join_pool() {
        stop_threads();
        threads_.join();
    }

  private:
    friend class fork_executor;

    // The base for all functions that are queued in the pool.
    struct function_base {
        std::shared_ptr<std::size_t> work_count_;
        void (*execute_)(std::shared_ptr<function_base>& p);
    };

    // Execute the next function from the queue, if any. Returns true if a
    // function was executed, and false if the queue was empty.
    bool execute_next(std::unique_lock<std::mutex>& lock) {
        if (queue_.empty())
            return false;
        auto p(queue_.front());
        queue_.pop();
        lock.unlock();
        execute(lock, p);
        return true;
    }

    // Execute a function and decrement the outstanding work.
    void execute(std::unique_lock<std::mutex>& lock,
                 std::shared_ptr<function_base>& p) {
        std::shared_ptr<std::size_t> work_count(std::move(p->work_count_));
        try {
            p->execute_(p);
            lock.lock();
            do_work_finished(work_count);
        } catch (...) {
            lock.lock();
            do_work_finished(work_count);
            throw;
        }
    }

    // Increment outstanding work.
    void
    do_work_started(const std::shared_ptr<std::size_t>& work_count) noexcept {
        if (++(*work_count) == 1)
            ++use_count_;
    }

    // Decrement outstanding work. Notify waiting threads if we run out.
    void
    do_work_finished(const std::shared_ptr<std::size_t>& work_count) noexcept {
        if (--(*work_count) == 0) {
            --use_count_;
            condition_.notify_all();
        }
    }

    // Dispatch a function, executing it immediately if the queue is already
    // loaded. Otherwise adds the function to the queue and wakes a thread.
    void do_dispatch(std::shared_ptr<function_base> p,
                     const std::shared_ptr<std::size_t>& work_count) {
        std::unique_lock<std::mutex> lock(mutex_);
        if (queue_.size() > thread_count_ * 16) {
            do_work_started(work_count);
            lock.unlock();
            execute(lock, p);
        } else {
            queue_.push(p);
            do_work_started(work_count);
            condition_.notify_one();
        }
    }

    // Add a function to the queue and wake a thread.
    void do_post(std::shared_ptr<function_base> p,
                 const std::shared_ptr<std::size_t>& work_count) {
        std::lock_guard<std::mutex> lock(mutex_);
        queue_.push(p);
        do_work_started(work_count);
        condition_.notify_one();
    }

    // Ask all threads to shut down.
    void stop_threads() {
        std::lock_guard<std::mutex> lock(mutex_);
        --use_count_;
        condition_.notify_all();
    }

    std::mutex mutex_;
    std::condition_variable condition_;
    std::queue<std::shared_ptr<function_base>> queue_;
    std::size_t use_count_;
    std::size_t thread_count_;
    boost::asio::thread_pool threads_;
};

// A class that satisfies the Executor requirements. Every function or piece of
// work associated with a fork_executor is part of a single, joinable group.
class fork_executor {
  public:
    fork_executor(fork_join_pool& ctx)
            : context_(ctx), work_count_(std::make_shared<std::size_t>(0)) {}

    fork_join_pool& context() const noexcept { return context_; }

    void on_work_started() const noexcept {
        std::lock_guard<std::mutex> lock(context_.mutex_);
        context_.do_work_started(work_count_);
    }

    void on_work_finished() const noexcept {
        std::lock_guard<std::mutex> lock(context_.mutex_);
        context_.do_work_finished(work_count_);
    }

    template <class Func, class Alloc>
    void dispatch(Func&& f, const Alloc& a) const {
        auto p(std::allocate_shared<exFun<Func>>(
            typename std::allocator_traits<Alloc>::template rebind_alloc<char>(a),
            std::move(f), work_count_));
        context_.do_dispatch(p, work_count_);
    }

    template <class Func, class Alloc> void post(Func f, const Alloc& a) const {
        auto p(std::allocate_shared<exFun<Func>>(
            typename std::allocator_traits<Alloc>::template rebind_alloc<char>(a),
            std::move(f), work_count_));
        context_.do_post(p, work_count_);
    }

    template <class Func, class Alloc>
    void defer(Func&& f, const Alloc& a) const {
        post(std::forward<Func>(f), a);
    }

    friend bool operator==(const fork_executor& a, const fork_executor& b) noexcept {
        return a.work_count_ == b.work_count_;
    }

    friend bool operator!=(const fork_executor& a, const fork_executor& b) noexcept {
        return a.work_count_ != b.work_count_;
    }

    // Block until all work associated with the executor is complete. While it
    // is waiting, the thread may be borrowed to execute functions from the
    // queue.
    void join() const {
        std::unique_lock<std::mutex> lock(context_.mutex_);
        while (*work_count_ > 0)
            if (!context_.execute_next(lock))
                context_.condition_.wait(lock);
    }

  private:
    template <class Func> struct exFun : fork_join_pool::function_base {
        explicit exFun(Func f, const std::shared_ptr<std::size_t>& w)
                : function_(std::move(f)) {
            work_count_ = w;
            execute_ = [](std::shared_ptr<fork_join_pool::function_base>& p) {
                Func tmp(std::move(static_cast<exFun*>(p.get())->function_));
                p.reset();
                tmp();
            };
        }

        Func function_;
    };

    fork_join_pool& context_;
    std::shared_ptr<std::size_t> work_count_;
};

// Helper class to automatically join a fork_executor when exiting a scope.
class join_guard {
  public:
    explicit join_guard(const fork_executor& ex) : ex_(ex) {}
    join_guard(const join_guard&) = delete;
    join_guard(join_guard&&) = delete;
    ~join_guard() { ex_.join(); }

  private:
    fork_executor ex_;
};

//------------------------------------------------------------------------------

#include <algorithm>
#include <iostream>
#include <random>
#include <vector>
#include <boost/bind.hpp>

static void foo(const uint64_t begin, uint64_t *result)
{
    uint64_t prev[] = {begin, 0};
    for (uint64_t i = 0; i < 1000000000; ++i) {
        const auto tmp = (prev[0] + prev[1]) % 1000;
        prev[1] = prev[0];
        prev[0] = tmp;
    }
    *result = prev[0];
}

void batch(fork_join_pool &pool, const uint64_t (&a)[2])
{
    uint64_t r[] = {0, 0};
    {
        fork_executor fork(pool);
        join_guard join(fork);
        boost::asio::post(fork, boost::bind(foo, a[0], &r[0]));
        boost::asio::post(fork, boost::bind(foo, a[1], &r[1]));
        // fork.join(); // or let join_guard destructor run
    }
    std::cerr << "foo(" << a[0] << "): " << r[0] << " foo(" << a[1] << "): " << r[1] << std::endl;
}

int main() {
    fork_join_pool pool;

    batch(pool, {2, 4});
    batch(pool, {3, 5});
    batch(pool, {7, 9});
}

印刷品。

foo(2): 2 foo(4): 4
foo(3): 503 foo(5): 505
foo(7): 507 foo(9): 509

需要注意的事情。

  • 执行器可以重叠嵌套:你可以在一个fork_join_pool上使用多个可加入的fork_executors,它们将加入每个执行器的不同任务组。

当你看库的例子时,你可以很容易地得到这种感觉(它做了一个递归的划分和征服合并排序)。


3
投票

最好的做法是不要重复使用池子(如果你一直创建新的池子,那么池子有什么用呢)。

如果你想确保你的批次 "时间 "在一起,我建议使用 when_all 关于期货。

Live On Coliru

#define BOOST_THREAD_PROVIDES_FUTURE_WHEN_ALL_WHEN_ANY
#include <iostream>
#include <boost/bind.hpp>
#include <boost/asio.hpp>
#include <boost/thread.hpp>

uint64_t foo(uint64_t begin) {
    uint64_t prev[] = {begin, 0};
    for (uint64_t i = 0; i < 1000000000; ++i) {
        const auto tmp = (prev[0] + prev[1]) % 1000;
        prev[1] = prev[0];
        prev[0] = tmp;
    }
    return prev[0];
}

void batch(boost::asio::thread_pool &pool, const uint64_t a[2])
{
    using T = boost::packaged_task<uint64_t>;

    T tasks[] {
        T(boost::bind(foo, a[0])),
        T(boost::bind(foo, a[1])),
    };

    auto all = boost::when_all(
        tasks[0].get_future(),
        tasks[1].get_future());

    for (auto& t : tasks)
        post(pool, std::move(t));

    auto [r0, r1] = all.get();
    std::cerr << "foo(" << a[0] << "): " << r0.get() << " foo(" << a[1] << "): " << r1.get() << std::endl;
}

int main() {
    boost::asio::thread_pool pool(2);

    const uint64_t a[] = {2, 4};
    batch(pool, a);

    const uint64_t b[] = {3, 5};
    batch(pool, b);

    const uint64_t c[] = {7, 9};
    batch(pool, c);
}

印刷品

foo(2): 2 foo(4): 4
foo(3): 503 foo(5): 505
foo(7): 507 foo(9): 509

我会考虑

  • 普遍化
  • 信息排队

一般化

不硬性规定批次大小,使其更加灵活。毕竟,池子的大小已经固定了,我们不需要 "确保批次合适 "之类的东西。

Live On Coliru

#define BOOST_THREAD_PROVIDES_FUTURE_WHEN_ALL_WHEN_ANY
#include <iostream>
#include <boost/bind.hpp>
#include <boost/asio.hpp>
#include <boost/thread.hpp>
#include <boost/thread/future.hpp>

struct Result { uint64_t begin, result; };

Result foo(uint64_t begin) {
    uint64_t prev[] = {begin, 0};
    for (uint64_t i = 0; i < 1000000000; ++i) {
        const auto tmp = (prev[0] + prev[1]) % 1000;
        prev[1] = prev[0];
        prev[0] = tmp;
    }
    return { begin, prev[0] };
}

void batch(boost::asio::thread_pool &pool, std::vector<uint64_t> const a)
{
    using T = boost::packaged_task<Result>;
    std::vector<T> tasks;
    tasks.reserve(a.size());

    for(auto begin : a)
        tasks.emplace_back(boost::bind(foo, begin));

    std::vector<boost::unique_future<T::result_type> > futures;
    for (auto& t : tasks) {
        futures.push_back(t.get_future());
        post(pool, std::move(t));
    }

    for (auto& fut : boost::when_all(futures.begin(), futures.end()).get()) {
        auto r = fut.get();
        std::cerr << "foo(" << r.begin << "): " << r.result << " ";
    }
    std::cout << std::endl;
}

int main() {
    boost::asio::thread_pool pool(2);

    batch(pool, {2});
    batch(pool, {4, 3, 5});
    batch(pool, {7, 9});
}

印刷品

foo(2): 2 
foo(4): 4 foo(3): 503 foo(5): 505 
foo(7): 507 foo(9): 509 

一般化2:变式简单化

与流行的看法相反(说实话,通常会发生什么),这次我们可以利用变量来摆脱所有的中间向量(每一个)。

Live On Coliru

void batch(boost::asio::thread_pool &pool, T... a)
{
    auto launch = [&pool](uint64_t begin) {
        boost::packaged_task<Result> pt(boost::bind(foo, begin));
        auto fut = pt.get_future();
        post(pool, std::move(pt));
        return fut;
    };

    for (auto& r : {launch(a).get()...}) {
        std::cerr << "foo(" << r.begin << "): " << r.result << " ";
    }

    std::cout << std::endl;
}

如果你坚持要及时输出结果,你仍然可以添加 when_all (需要更多的英雄主义来解开元组的包装)。

Live On Coliru

template <typename...T>
void batch(boost::asio::thread_pool &pool, T... a)
{
    auto launch = [&pool](uint64_t begin) {
        boost::packaged_task<Result> pt(boost::bind(foo, begin));
        auto fut = pt.get_future();
        post(pool, std::move(pt));
        return fut;
    };

    std::apply([](auto&&... rfut) {
        Result results[] {rfut.get()...};
        for (auto& r : results) {
            std::cerr << "foo(" << r.begin << "): " << r.result << " ";
        }
    }, boost::when_all(launch(a)...).get());

    std::cout << std::endl;
}

两者的打印结果仍然相同

消息排队

这是很自然的提升,也算是跳过了大部分的复杂性。如果你还想按批次组报,就得协调了。

Live On Coliru

#include <iostream>
#include <boost/asio.hpp>
#include <memory>

struct Result { uint64_t begin, result; };

Result foo(uint64_t begin) {
    uint64_t prev[] = {begin, 0};
    for (uint64_t i = 0; i < 1000000000; ++i) {
        const auto tmp = (prev[0] + prev[1]) % 1000;
        prev[1] = prev[0];
        prev[0] = tmp;
    }
    return { begin, prev[0] };
}

using Group = std::shared_ptr<size_t>;
void batch(boost::asio::thread_pool &pool, std::vector<uint64_t> begins) {
    auto group = std::make_shared<std::vector<Result> >(begins.size());

    for (size_t i=0; i < begins.size(); ++i) {
        post(pool, [i,begin=begins.at(i),group] {
              (*group)[i] = foo(begin);
              if (group.unique()) {
                  for (auto& r : *group) {
                      std::cout << "foo(" << r.begin << "): " << r.result << " ";
                      std::cout << std::endl;
                  }
              }
          });
    }
}

int main() {
    boost::asio::thread_pool pool(2);

    batch(pool, {2});
    batch(pool, {4, 3, 5});
    batch(pool, {7, 9});
    pool.join();
}

请注意,这是有并发访问 group由于元素访问的限制,它是安全的。

打印。

foo(2): 2 
foo(4): 4 foo(3): 503 foo(5): 505 
foo(7): 507 foo(9): 509 
© www.soinside.com 2019 - 2024. All rights reserved.