NumPy array2string 的奇怪性能

Question

我正在使用 NumPy 的

array2string

函数将数组中的值转换为字符串格式以写入 ascii 文件。对于大型数组来说，它简单且相对快速，并且在循环中或使用

map

执行字符串格式化的本机 python 操作。

aa = np.array2string(array.flatten(), precision=precision, separator=' ', max_line_width=(precision + 4) * ncolumns, prefix='         ', floatmode='fixed')

aa =  '         ' + aa[1:-1] + '\n'

但是，在使用小型数组进行测试时，当数组中的元素数量少于几千时，我注意到一些奇怪的结果。我已经使用

map

和

join

与本机 python 方法进行了快速比较，并且在性能方面，它达到了我的预期——当数组变得相当大时，速度会变慢，而对于非常小的数组，速度会更快，因为numpy 函数的开销。

我使用

perfplot

运行基准测试并展示我的意思：

有谁知道上面的

numpy.array2string

方法中出现奇怪尖峰的原因是什么？ (100, 3) 数组实际上比 (500000,3) 数组慢。

我只是好奇发生了什么，

numpy

解决方案仍然是我的数据可能大小（> 1000）的最佳选择，但峰值似乎很奇怪。

更新 - 添加了完整代码

这是我在计算机上运行的完整脚本：

import numpy as np
import perfplot


precision = 16
ncolumns = 6

# numpy method
def numpystring(array, precision, ncolumns):
    indent = '          '
    aa = np.array2string(array.flatten(), precision=precision, separator=' ', max_line_width=(precision + 6) * ncolumns,
                     prefix='         ', floatmode='fixed')
    return indent + aa[1:-1] + '\n'

# native python string creation
def nativepython_string(array, precision, ncolumns):
    fmt = '{' + f":.{precision}f" + '}'
    data_str = ''

    # calculate number of full rows
    if array.size <= ncolumns:
        nrows = 1
    else:
        nrows = int(array.size / ncolumns)

    # write full rows
    for row in range(nrows):
        shift = row * ncolumns
        data_str += '          ' + ' '.join(
            map(lambda x: fmt.format(x), array.flatten()[0 + shift:ncolumns + shift])) + '\n'

    # write any remaining data in last non-full row
    if array.size > ncolumns and array.size % ncolumns != 0:
        data_str += '          ' + ' '.join(
            map(lambda x: fmt.format(x), array.flatten()[ncolumns + shift::])) + '\n'

    return data_str

# Benchmark methods
out = perfplot.bench(
    setup=lambda n: np.random.random([n,3]),  # setup random nx3 array
    kernels=[
        lambda a: nativepython_string(a, precision, ncolumns),
        lambda a: numpystring(a, precision, ncolumns)
    ],
    equality_check=None,
    labels=["Native", "NumPy"],
    n_range=[2**k for k in range(16)],
    xlabel="Number of vectors [Nr.]",
    title="String Conversion Performance"

)

out.show(
    time_unit="us",  # set to one of ("auto", "s", "ms", "us", or "ns") to force plot units
)
out.save("perf.png", transparent=True, bbox_inches="tight")

希望这有帮助。

Answer 1

使用

savetxt

与小型二维数组的示例：

In [87]: np.savetxt('test.txt', np.arange(24).reshape(3,8), fmt='%5d')
In [88]: cat test.txt
    0     1     2     3     4     5     6     7
    8     9    10    11    12    13    14    15
   16    17    18    19    20    21    22    23

In [90]: np.savetxt('test.txt', np.arange(24).reshape(3,8), fmt='%5d', newline=' ')
In [91]: cat test.txt
    0     1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20    21    22    23

它根据参数和列数构造一个

fmt

字符串：

In [95]: fmt=' '.join(['%5d']*8)
In [96]: fmt
Out[96]: '%5d %5d %5d %5d %5d %5d %5d %5d'

然后将此行写入文件：

In [97]: fmt%tuple(np.arange(8))
Out[97]: '    0     1     2     3     4     5     6     7'

NumPy array2string 的奇怪性能

问题描述投票：0回答：1

更新 - 添加了完整代码

1个回答

最新问题

NumPy array2string 的奇怪性能

问题描述 投票：0回答：1

更新 - 添加了完整代码

1个回答

最新问题

问题描述投票：0回答：1