有什么方法可以加快这个 numba 函数的速度,其相当于带有 o3 标志的 C++ 实现速度快 10 倍

问题描述 投票:0回答:1

函数 get_cum_dist 定义如下:

from numba import njit
import numpy as np

@njit(fastmath=True)
def get_cum_dist(perm: np.ndarray, c: np.ndarray, n: int) -> np.array:
    cum_dist = np.empty(n)
    cum_dist[0] = 0.
    cum_dist[1] = 0.
    for i in range(1, n - 1):
        cum_dist[i + 1] = cum_dist[i] + c[perm[i - 1], perm[i]]
    return cum_dist

对于输入,

n = 1000
perm = np.random.permutation(n)
c = np.random.random((n+1,n+1))
cum_dist = get_cum_dist(perm, c, n)

此函数在我的算法中被调用多次,任何有关潜在加速的建议都将受到高度赞赏!

这是 C++ 中的一次尝试,没有随机排列输入(我是 C++ 菜鸟)。但是当我使用 -O3 flat 编译此代码时,我发现比 numba 版本快 10 倍。

#include <iostream>
using namespace std;
#include <bits/stdc++.h>
#include <chrono>
using namespace std::chrono;

int main() {
    int n, sum = 0;
    int rows = 1001;
    int cols = 1001;
    int randArr[rows][cols];
    for (int i=0;i<rows;i++)
        for (int j=0; j<cols; j++)
            randArr[i][j] = 1 + (rand() % 500);
    n = 1000;
    int arr[1000]={0};
    auto start = high_resolution_clock::now();
    for (int i = 1; i <= n; ++i) {
        arr[i+1]=arr[i] + randArr[i-1][i];
    }
    auto stop = high_resolution_clock::now();
    auto duration = duration_cast<nanoseconds>(stop - start);
    cout << duration.count() << endl;
    return 0;
c++ performance numba
1个回答
0
投票

@463035818_is_not_an_ai 非常感谢!我对 C++ 代码做了一些修改,现在应该生成有意义的时序。

#include <iostream>
using namespace std;
#include <bits/stdc++.h>
#include <chrono>
using namespace std::chrono;
#include <algorithm>
#include <vector>
#include <cstdlib>

int main() {
    int n, sum = 0;
    int rows = 1001;
    int cols = 1001;
    int randArr[rows][cols];
    for (int i=0;i<rows;i++)
        for (int j=0; j<cols; j++)
                randArr[i][j] = 1 + (rand() % 500);
    vector<int> myvector;
    n = 1000;
    for (int i=1;i<n;++i) myvector.push_back(i);
    std::random_shuffle(myvector.begin(), myvector.end());
    int arr[1001]={0};
    auto start = high_resolution_clock::now();
    for (int i = 1; i < n; ++i) {
        arr[i+1]=arr[i] + randArr[myvector[i-1]][myvector[i]];
    }
    auto stop = high_resolution_clock::now();
    auto duration = duration_cast<nanoseconds>(stop - start);
    cout << duration.count() << endl;
    cout << arr[1000] << endl;
    return 0;
}

我打印了 arr 中的值,这样结果是有用的,希望编译器不会优化掉时间部分。

© www.soinside.com 2019 - 2024. All rights reserved.