numpy将排序数组合并到一个新数组?

问题描述 投票:14回答:2

有没有什么办法可以使用numpy函数做一些像mergesort中的merge一样的东西?

合并之类的功能:

a = np.array([1,3,5])
b = np.array([2,4,6])
c = merge(a, b) # c == np.array([1,2,3,4,5,6])

我希望通过numpy可以获得大数据的高性能

python sorting numpy
2个回答
10
投票

您可以使用

c = concatenate((a,b))
c.sort(kind='mergesort')

我担心你不能比这更好,除非你把你自己的排序函数写成python扩展,àlacython

有关类似问题,请参阅this问题,但仅保留合并数组中的唯一值。基准和评论也很有见地。


2
投票

当一个数组比另一个数组大得多时,通过执行np.searchorted可以获得一个不错的加速(在我的电脑上是5倍),这主要是通过搜索较小数组的插入索引来限制速度:

import numpy as np

def classic_merge(a, b):
    c = np.concatenate((a,b))
    c.sort(kind='mergesort')
    return c

def new_merge(a, b):
    if len(a) < len(b):
        b, a = a, b
    c = np.empty(len(a) + len(b), dtype=a.dtype)
    b_indices = np.arange(len(b)) + np.searchsorted(a, b)
    a_indices = np.ones(len(c), dtype=bool)
    a_indices[b_indices] = False
    c[b_indices] = b
    c[a_indices] = a
    return c

时间给出:

from timeit import timeit as t

results = []
for size_digits in range(2, 8):
    size = 10**size_digits
    # size difference of a factor 10 here makes the difference!
    a = np.arange(size // 10, dtype=np.int)
    b = np.arange(size, dtype=np.int)
    classic = t(lambda: classic_merge(a, b), number=10)
    new = t(lambda: new_merge(a, b), number=10)
    results.append((size_digits, classic, new))

if True:
    text_format = " ".join(["{:<15}"] * 3)
    print(text_format.format("log10(size)", "Classic", "New"))
    table_format = "         ".join(["{:.5f}"] * 3)
    for result in results:
        print(table_format.format(*result))

log10(size)     Classic         New            
2.00000         0.00009         0.00027
3.00000         0.00021         0.00030
4.00000         0.00233         0.00082
5.00000         0.02827         0.00601
6.00000         0.33322         0.06059
7.00000         4.40571         0.86764

当a和b大致相等时,长度差异较小:

from timeit import timeit as t

results = []
for size_digits in range(2, 8):
    size = 10**size_digits
    # same size
    a = np.arange(size , dtype=np.int)
    b = np.arange(size, dtype=np.int)
    classic = t(lambda: classic_merge(a, b), number=10)
    new = t(lambda: new_merge(a, b), number=10)
    results.append((size_digits, classic, new))

if True:
    text_format = " ".join(["{:<15}"] * 3)
    print(text_format.format("log10(size)", "Classic", "New"))
    table_format = "         ".join(["{:.5f}"] * 3)
    for result in results:
        print(table_format.format(*result))

log10(size)     Classic         New            
2.00000         0.00026         0.00087
3.00000         0.00108         0.00182
4.00000         0.01257         0.01243
5.00000         0.16333         0.12692
6.00000         1.05006         0.49186
7.00000         8.35967         5.93732
© www.soinside.com 2019 - 2024. All rights reserved.