我需要通过获取更重要的点来压缩 numpy 数组,我的意思是哪里有变化,但它必须是特定数量的点,例如我的数组可能有 100 个值,我需要 40 个最有意义的值。 我尝试了 rdp 算法,但你不能指定点数,你不能期望结果的长度 所以现在我这样做,但这只是随机点而不是最重要的
n=40
indices = np.linspace(0, len(array) - 1, num=n)
这是我在评论中概述的想法的实现。我们所做的是:我们使用常规的 Ramer–Douglas–Peucker 算法。但是,我们不是递归地分割点列表直到达到 epsilon,而是总是分割最差的子列表,直到达到剩余点的目标大小。
我确信实施需要一些工作,但基本想法似乎可行。基本算法取自维基百科,用于验证和比较。
#!/usr/bin/env python3
import heapq
import numpy as np
def perpendicular_distance(point, start, end):
"""Distance from point to the line through start and end"""
dist = end - start
absdist = np.linalg.norm(dist)
if absdist == 0.:
return np.linalg.norm(point - start)
return np.linalg.norm(np.cross(dist, start - point)) / absdist
def douglas_peucker(pointlist, epsilon):
if len(pointlist) < 2:
return list(pointlist)
# Find the point with the maximum distance
start, end = pointlist[0], pointlist[-1]
dmax, index = max((perpendicular_distance(point, start, end), index)
for index, point in enumerate(pointlist))
if dmax < epsilon:
return [start, end]
rtrn = douglas_peucker(pointlist[:index], epsilon)
rtrn += douglas_peucker(pointlist[index:], epsilon)
return rtrn
class DprList:
def __init__(self, pointlist, startindex):
self.pointlist = pointlist
self.startindex = startindex
start, end = pointlist[0], pointlist[-1]
self.dmax, self.index = max(
(perpendicular_distance(point, start, end), index)
for index, point in enumerate(pointlist))
def result(self):
pointlist = self.pointlist
if len(pointlist) < 2:
return list(pointlist)
return [pointlist[0], pointlist[-1]]
def __lt__(self, other):
"""Higher dmax results in lower sort order
Required for python's heapq module
"""
return self.dmax >= other.dmax
def split(self):
pointlist = self.pointlist
index = self.index
startindex = self.startindex
front = DprList(pointlist[:index], startindex)
back = DprList(pointlist[index:], startindex + index)
return front, back
def douglas_peucker_heap(pointlist, targetcount):
if len(pointlist) <= targetcount:
return list(pointlist)
heap = [DprList(pointlist, 0)]
finished = [] # sublists with length 1
while len(heap) * 2 + len(finished) < targetcount:
worst = heapq.heappop(heap)
for sublist in worst.split():
if len(sublist.pointlist) < 2:
finished.append(sublist)
else:
heapq.heappush(heap, sublist)
heap += finished
heap.sort(key=lambda sublist: sublist.startindex)
return sum((sublist.result() for sublist in heap), [])
def main():
points = np.array(
((0., 0.),
(1., 2.),
(3., -2.),
(4., -2.),
(5., -3.),
(6., 5.),
(7., 7.),
(8., 0.)))
print(np.asarray(douglas_peucker(points, 6.)))
print(np.asarray(douglas_peucker_heap(points, 4)))
if __name__ == '__main__':
main()