增强合并容器

问题描述 投票:0回答:1

Boost 没有给出合并集合或多重集的合并方法。

所以我需要做这样的事情来合并。

int main() {
    boost::container::flat_multiset<int> set1 = {1, 2, 3};
    boost::container::flat_multiset<int> set2 = {3, 4, 5};

    // Manually insert elements from set2 into set1
    for (const auto& value : set2) {
        set1.insert(value);
    }
}

高效吗? 多套如何处理?

c++ boost boost-container
1个回答
0
投票

所以,我继续检查我的预测:

@TedZach 为什么? ordered_range 插入可能(更)高效。 – 瑟赫 1小时前

事实上,使用专门的顺序插入比简单的方法合并 100 个包含 10'000 个元素的随机平面多重集要快很多。正确地使用 std::merge

  是非常笨拙的,因为无论如何你都无法直接插入到目的地:

如果目标范围与以下任意一个重叠,则行为未定义 输入范围(输入范围可能相互重叠)。

所以不,你绝对不想要

std::merge

CAN做了很多工作并手动就地合并到固定序列中,然后采用多重集,但这不值得付出努力,而且无论如何都只是与有序插入相同。如果您需要支持唯一的以及多个集合,或者具有自定义顺序谓词的集合,这一点会变得更加明显。

住在Coliru

#include <algorithm>
#include <boost/container/flat_set.hpp>
#include <boost/container_hash/hash.hpp>
#include <chrono>
#include <functional>
#include <iostream>
#include <random>
#include <ranges>
#include <vector>
namespace bc = boost::container;
using set_t  = bc::flat_multiset<int>;
using data_t = std::vector<set_t>;
using namespace std::chrono_literals;

namespace /*file static*/ {
    auto generate_sets(unsigned num_sets, unsigned set_size = 10'000) {
        data_t sets(num_sets);

        std::mt19937 prng{std::random_device{}()};
        auto         gen = bind(std::uniform_int_distribution<int>{-9'999, +9'999}, prng);

        for (auto& set : sets)
            generate_n(inserter(set, set.end()), set_size, gen);

        return sets;
    }

    // the naive approach
    set_t naive_merge(data_t const& sets) {
        set_t r;
        for (auto const& s : sets)
            r.insert(s.begin(), s.end());
        return r;
    }

    // std::merge really doesn't fit the bill here:
    set_t std_merge(data_t const& sets) {
        if (sets.empty())
            return {};
        set_t r(sets.front()), tmp;
        for (auto const& s : sets | std::views::drop(1)) {
            tmp.clear();
            std::merge(r.begin(), r.end(), s.begin(), s.end(), inserter(tmp, tmp.end()));
            r.swap(tmp);
        }
        return r;
    }

    // the correct usage of ordered flatmultiset::insert!
    set_t fast_merge(data_t const& sets) {
        set_t r;
        for (auto const& s : sets)
            r.insert(bc::ordered_range, s.begin(), s.end());
        return r;
    }

    // this is the manual to do similar to fast_merge:
    set_t manual_inplace(data_t const& sets) {
        set_t::sequence_type underlying;
        // preallocate
        underlying.reserve( //
            accumulate(sets.begin(), sets.end(), size_t{},
                       [](size_t acc, set_t const& s) { return acc + s.size(); }));

        // using repeated inplace_merge:
        for (auto const& s : sets) {
            underlying.reserve(underlying.size() + s.size()); // avoid iterator invalidation

            auto m = underlying.end();
            underlying.insert(m, s.begin(), s.end());
            std::inplace_merge(underlying.begin(), m, underlying.end());
        }

        // adopt the sequence!
        return {bc::ordered_range, underlying.begin(), underlying.end()};
    }

} // namespace

int main() {
    auto bench = [sets = generate_sets(100, 10'000)](auto caption, set_t (*f)(data_t const&)) {
        static auto constexpr now = std::chrono::high_resolution_clock::now;

        auto start = now();
        auto r     = f(sets);
        auto stop  = now();

        size_t checksum = boost::hash_range(begin(r), end(r));
        std::cout << caption << "\t" << (stop - start) / 1.ms << "ms\t" //
                  << std::hex << checksum << std::dec << "\n";
        return r;
    };

    // benchmark
    bench("naive_merge",    naive_merge);
    bench("std_merge",      std_merge);
    bench("fast_merge",     fast_merge);
    bench("manual_inplace", manual_inplace);
}

印刷

naive_merge 96.4958ms   e81a4707eb9e51c6
std_merge   573.727ms   e81a4707eb9e51c6
fast_merge  67.776ms    e81a4707eb9e51c6
manual_inplace  78.4506ms   e81a4707eb9e51c6

或本地:


© www.soinside.com 2019 - 2024. All rights reserved.