从数组中获取更多重要点

问题描述 投票:0回答:1

我需要通过获取更重要的点来压缩 numpy 数组,我的意思是哪里有变化,但它必须是特定数量的点,例如我的数组可能有 100 个值,我需要 40 个最有意义的值。 我尝试了 rdp 算法,但你不能指定点数,你不能期望结果的长度 所以现在我这样做,但这只是随机点而不是最重要的

n=40   
indices = np.linspace(0, len(array) - 1, num=n)
python arrays compression
1个回答
0
投票

这是我在评论中概述的想法的实现。我们所做的是:我们使用常规的 Ramer–Douglas–Peucker 算法。但是,我们不是递归地分割点列表直到达到 epsilon,而是总是分割最差的子列表,直到达到剩余点的目标大小。

我确信实施需要一些工作,但基本想法似乎可行。基本算法取自维基百科,用于验证和比较。

#!/usr/bin/env python3


import heapq

import numpy as np


def perpendicular_distance(point, start, end):
    """Distance from point to the line through start and end"""
    dist = end - start
    absdist = np.linalg.norm(dist)
    if absdist == 0.:
        return np.linalg.norm(point - start)
    return np.linalg.norm(np.cross(dist, start - point)) / absdist


def douglas_peucker(pointlist, epsilon):
    if len(pointlist) < 2:
        return list(pointlist)
    # Find the point with the maximum distance
    start, end = pointlist[0], pointlist[-1]
    dmax, index = max((perpendicular_distance(point, start, end), index)
                      for index, point in enumerate(pointlist))
    if dmax < epsilon:
        return [start, end]
    rtrn = douglas_peucker(pointlist[:index], epsilon)
    rtrn += douglas_peucker(pointlist[index:], epsilon)
    return rtrn


class DprList:
    def __init__(self, pointlist, startindex):
        self.pointlist = pointlist
        self.startindex = startindex
        start, end = pointlist[0], pointlist[-1]
        self.dmax, self.index = max(
                (perpendicular_distance(point, start, end), index)
                for index, point in enumerate(pointlist))

    def result(self):
        pointlist = self.pointlist
        if len(pointlist) < 2:
            return list(pointlist)
        return [pointlist[0], pointlist[-1]]

    def __lt__(self, other):
        """Higher dmax results in lower sort order

        Required for python's heapq module
        """
        return self.dmax >= other.dmax

    def split(self):
        pointlist = self.pointlist
        index = self.index
        startindex = self.startindex
        front = DprList(pointlist[:index], startindex)
        back = DprList(pointlist[index:], startindex + index)
        return front, back


def douglas_peucker_heap(pointlist, targetcount):
    if len(pointlist) <= targetcount:
        return list(pointlist)
    heap = [DprList(pointlist, 0)]
    finished = [] # sublists with length 1
    while len(heap) * 2 + len(finished) < targetcount:
        worst = heapq.heappop(heap)
        for sublist in worst.split():
            if len(sublist.pointlist) < 2:
                finished.append(sublist)
            else:
                heapq.heappush(heap, sublist)
    heap += finished
    heap.sort(key=lambda sublist: sublist.startindex)
    return sum((sublist.result() for sublist in heap), [])


def main():
    points = np.array(
        ((0., 0.),
         (1., 2.),
         (3., -2.),
         (4., -2.),
         (5., -3.),
         (6., 5.),
         (7., 7.),
         (8., 0.)))
    print(np.asarray(douglas_peucker(points, 6.)))
    print(np.asarray(douglas_peucker_heap(points, 4)))


if __name__ == '__main__':
    main()
© www.soinside.com 2019 - 2024. All rights reserved.