有没有什么办法可以使用numpy函数做一些像mergesort中的merge一样的东西?
合并之类的功能:
a = np.array([1,3,5])
b = np.array([2,4,6])
c = merge(a, b) # c == np.array([1,2,3,4,5,6])
我希望通过numpy可以获得大数据的高性能
您可以使用
c = concatenate((a,b))
c.sort(kind='mergesort')
我担心你不能比这更好,除非你把你自己的排序函数写成python扩展,àlacython
。
有关类似问题,请参阅this问题,但仅保留合并数组中的唯一值。基准和评论也很有见地。
当一个数组比另一个数组大得多时,通过执行np.searchorted可以获得一个不错的加速(在我的电脑上是5倍),这主要是通过搜索较小数组的插入索引来限制速度:
import numpy as np
def classic_merge(a, b):
c = np.concatenate((a,b))
c.sort(kind='mergesort')
return c
def new_merge(a, b):
if len(a) < len(b):
b, a = a, b
c = np.empty(len(a) + len(b), dtype=a.dtype)
b_indices = np.arange(len(b)) + np.searchsorted(a, b)
a_indices = np.ones(len(c), dtype=bool)
a_indices[b_indices] = False
c[b_indices] = b
c[a_indices] = a
return c
时间给出:
from timeit import timeit as t
results = []
for size_digits in range(2, 8):
size = 10**size_digits
# size difference of a factor 10 here makes the difference!
a = np.arange(size // 10, dtype=np.int)
b = np.arange(size, dtype=np.int)
classic = t(lambda: classic_merge(a, b), number=10)
new = t(lambda: new_merge(a, b), number=10)
results.append((size_digits, classic, new))
if True:
text_format = " ".join(["{:<15}"] * 3)
print(text_format.format("log10(size)", "Classic", "New"))
table_format = " ".join(["{:.5f}"] * 3)
for result in results:
print(table_format.format(*result))
log10(size) Classic New
2.00000 0.00009 0.00027
3.00000 0.00021 0.00030
4.00000 0.00233 0.00082
5.00000 0.02827 0.00601
6.00000 0.33322 0.06059
7.00000 4.40571 0.86764
当a和b大致相等时,长度差异较小:
from timeit import timeit as t
results = []
for size_digits in range(2, 8):
size = 10**size_digits
# same size
a = np.arange(size , dtype=np.int)
b = np.arange(size, dtype=np.int)
classic = t(lambda: classic_merge(a, b), number=10)
new = t(lambda: new_merge(a, b), number=10)
results.append((size_digits, classic, new))
if True:
text_format = " ".join(["{:<15}"] * 3)
print(text_format.format("log10(size)", "Classic", "New"))
table_format = " ".join(["{:.5f}"] * 3)
for result in results:
print(table_format.format(*result))
log10(size) Classic New
2.00000 0.00026 0.00087
3.00000 0.00108 0.00182
4.00000 0.01257 0.01243
5.00000 0.16333 0.12692
6.00000 1.05006 0.49186
7.00000 8.35967 5.93732