如何改进 cython 代码以使其比 numpy select 函数更快?

问题描述 投票:0回答:1

我试图使代码比“numpy select”更快,但它比 numpy 慢。 numpy select 比我的 cython 代码快两倍。 我尝试了大数据集和小数据集,但这两种情况 numpy select 都更快(numpy select 11.4ms,cython code 24ms)

我尝试了cython文档中的方法,但未能缩小速度差距。 这是我详细的 cython 代码。

二手包

import numpy as np
import pandas as pd
import cython
import random
import timeit
import time
%load_ext Cython

使用的数据集

dur_m = np.random.randint(1, 1001, size=100000)
pol_year = np.random.randint(1, 1001, size=100000)
calc_flag = 1
type = np.random.choice(['IF','NB', 'NB2', 'NB3'], size = 100000)

rand = np.arange(0.01, 0.05, 0.0001)
output1 = np.random.choice(rand, size=100000)
output2 = np.random.choice(rand, size=100000)
output3 = np.random.choice(rand, size=100000)

Numpy 测试

def compute_np(t):
    
    condition = [
    (t > dur_m) & (t < pol_year) & (calc_flag ==1),
    (t < dur_m) & (calc_flag ==1),
    (t < pol_year)
    ]
        
    result = [
    output1,
    output2,
    output3
    ]

    default = np.array([0] * 100000)

    return np.select(condition, result, default)

Cython 代码

%%cython --annotate
import cython
cimport cython
import numpy as np
cimport numpy as np
@cython.boundscheck(False)
@cython.wraparound(False)
def select_cy2(np.ndarray[np.uint8_t, ndim = 2, cast=True] conditions, double [:, ::1] choice, double [:] default_value):
    cdef int num_condition = conditions.shape[0]
    cdef int length = conditions.shape[1]
    cdef np.ndarray[np.float64_t, ndim=1] result = np.zeros(length, dtype=np.float64)
    cdef int i, j
    for j in range(length):
        for i in range(num_condition):
            if conditions[i,j]:
                result[j] = choice[i,j]
                break
            else:
                result[j] = default_value[i]
    return result

Cython 测试

def compute_cy(t):
    
    condition = [
    (t > dur_m) & (t < pol_year) & (np.array([calc_flag]*100000) ==1), 
    (t < dur_m) & (np.array([calc_flag]*100000) ==1), 
    (t < pol_year)]
        
    result = [
    output1, 
    output2, 
    output3]

    default = np.array([0.0] * 100000)

    return select_cy(np.array(condition), np.array(result), default)

有谁可以建议提高速度的方法吗?

cython
1个回答
0
投票

我没有计时,所以它可能是错误的,但你反复复制

default_value
。也许:

    for j in range(length):
        result[j] = default_value[i]
        for i in range(num_condition):
            if conditions[i,j]:
                result[j] = choice[i,j]
                break
© www.soinside.com 2019 - 2024. All rights reserved.