使用 kNN 的自定义距离函数来提升 R 树

Question

几天以来，我一直在尝试弄清楚如何传递自定义距离函数来增强 R 树以进行 k 最近邻查询。

我有大量的线串和多边形。我将两者嵌入到 boost 边界框中，并在边界框上创建 R 树：

using RTreeValue = std::pair<BoostBox, FeatureId>;

对于空间查询（即

contains

、

covers

、

intersects

等），这非常有效且快速：

rtree.query(boost::geometry::index::intersects(query_shape), custom_filter);

第二个参数是一个自定义谓词，用于过滤掉边界框与查询形状相交但实际几何图形（线串、多边形）不与其相交的对象（基于

FeatureId

）。

问题来自于k-最近邻搜索。我想在边界框上使用same索引（R树）而不是使用实际的几何形状。原因是线串和多边形具有大量点，这可能会显着减慢查询速度。第二个原因是内存有限：我没有足够的内存（它是复杂系统的一部分）来在实际几何图形上创建另一个索引。

因此，我在边界框上有一个现有索引 (

RTreeValue

)。当我开始

bgi::nearest

查询时

std::vector<RTreeValue> results;
rtree.query(boost::geometry::index::nearest(point, k), custom_evaluation(results));

绝大多数结果是不正确。原因是某些边界框可以彼此相交 (

distance = 0

)，但它们的实际几何形状可能相距很远。 query 调用中的第二个参数仅被调用多次，与搜索到的邻居的数量相同，并且在计算两个对象之间的距离后调用它（因此它只能用于从最终结果中过滤掉对象） .

我一直在尝试

boost::geometry::indexable

，但尚未实现目标：它的目的是从输入对象提供可索引的几何图形来构建 R 树（boost 边界框当前是可索引的）。

任何想法，如何将自定义距离函数传递给 k 最近邻搜索，这将基于

FeatureId

计算实际距离？

Answer 1

正如评论中提到的，在 R 树遍历期间（即在

query(...)

期间）计算查询点到实际对象（折线、多边形）的距离结果证明计算成本相当高（在我的情况下，折线/多边形有很多点）。

因此，作为一个明显更快的解决方案，我实现了一个两步算法：第一步，它根据存储在 R 树中的边界框找到 k-最近邻居。第 k 个邻居将提供距离 d，在第二步中，该距离将再次用于边界框上的范围查询。

因此，仅针对两个查询之间的 k 对象计算查询点和折线/多边形之间的昂贵距离，然后对于范围查询的结果以及树遍历期间，仅计算便宜的

point-to-box

距离使用过。

该解决方案保证找到 k 最近的对象，基于以下事实：

distance(point-bbox) <= distance(point-object)

.

下面是简化的C++代码：

template<typename Type>
void queryKNN(const BoostPoint2d& p, uint32_t N,
              std::vector<std::pair<Type, double>>* result) {
    // 1. Find k-nearest neighbors in the R-Tree based on bounding boxes. With very high
    //    probability the found neighbors will NOT be the real closest ones. This step is
    //    necessary to get an approximate searching radius for the subsequent range search.
    std::vector<RTreeValue> tempObjs;
    // queryFilter can filter out objects based on their ID.
    _index.rtree()->query(
            boost::geometry::index::nearest(p, N)
                    && boost::geometry::index::satisfies(queryFilter),
            std::back_inserter(tempObjs));

    // 2. Get the "true" distance from the farthest object.
    //    True distance is defined as the distance from the query point to the actual object
    //    geometry (rather than to the bounding box of the object).
    //    Because the distance calculation is computationally extensive, cache the distances.
    double maxDist = 0.0;
    for (size_t i = 0; i < tempObjs.size(); ++i) {
        const auto object = getObjectByIdFn(tempObjs[i].second);
        maxDist = std::max(maxDist, PointToLineDistance(p, object.polyline));
    }

    // 3. Based on the k-th object, create a search box for a range query and perform the query.
    const auto newBox = BoostBox(
            BoostPoint(p.x - maxDist - EPSILON_M, p.y - maxDist - EPSILON_M),
            BoostPoint(p.x + maxDist + EPSILON_M, p.y + maxDist + EPSILON_M));
    tempObjs.clear();
    _index.rtree()->query(
            boost::geometry::index::intersects(newBox), std::back_inserter(tempObjs));

    // 4. Get the k-nearest neighbors based on their true distance. Because the range query
    //    can return significantly higher number of objects than the requested number 'k', use
    //    a k-max heap for getting the neighbors (O(1) complexity).
    std::vector<std::pair<FeatureId, double>> idDistArray;
    for (size_t i = 0; i < tempObjs.size(); ++i) {
        const auto id = tempObjs[i].second;
        const double distance = PointToLineDistance(p, getObjectByIdFn(id).line());
        auto item = std::make_pair(id, distance);
        if (idDistArray.size() < N) {
            idDistArray.push_back(item);
            std::push_heap(idDistArray.begin(), idDistArray.end(), heapComparator);
        } else if (distance < idDistArray[0].second) {
            std::pop_heap(idDistArray.begin(), idDistArray.end(), heapComparator);
            idDistArray.back() = item;
            std::push_heap(idDistArray.begin(), idDistArray.end(), heapComparator);
        }
    }

    // 5. Sort the max heap and copy the values to the final output array.
    std::sort_heap(idDistArray.begin(), idDistArray.end(), heapComparator);
    for (size_t i = 0; i < idDistArray.size(); ++i) {
        result->push_back(
            std::make_pair(getObjectByIdFn(idDistArray[i].first), idDistArray[i].second));
    }
}

使用 kNN 的自定义距离函数来提升 R 树

问题描述投票：0回答：1

1个回答

最新问题

使用 kNN 的自定义距离函数来提升 R 树

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1