Least_square scipy加速

Question

如何加快功能least_square？我们有六个变量（3个定向角和3个轴平移）需要优化。我们将两组3D点，平面上的两组点和投影矩阵应用于函数的输入。

dSeed = np.zeros(6)
optRes = least_squares(minReproj, dSeed, method='lm', max_nfev=600,
    args=(points_prev, points_cur, d3d_prev, d3d_cur, Proj1))

此功能使点的前后投影误差最小化。

def minReproj(dof, d2dPoints1, d2dPoints2, d3dPoints1, d3dPoints2, w2cMatrix):
    Rmat = genEulerZXZ(dof[0], dof[1], dof[2]) # my function
    translationArray = np.array([[dof[3]], [dof[4]], [dof[5]]])
    temp = np.hstack((Rmat, translationArray))
    perspectiveProj = np.vstack((temp, [0, 0, 0, 1]))

    numPoints = d2dPoints1.shape[0]
    errorA = np.zeros((numPoints,3))
    errorB = np.zeros((numPoints,3))

    forwardProj = np.matmul(w2cMatrix, perspectiveProj)
    backwardProj = np.matmul(w2cMatrix, np.linalg.inv(perspectiveProj))

    for i in range(numPoints):
        Ja = np.ones((3))
        Jb = np.ones((3))
        Wa = np.ones((4))
        Wb = np.ones((4))

        Ja[0:2] = d2dPoints1[i,:]
        Jb[0:2] = d2dPoints2[i,:]
        Wa[0:3] = d3dPoints1[i,:]
        Wb[0:3] = d3dPoints2[i,:]

        JaPred = np.matmul(forwardProj, Wb)
        JaPred /= JaPred[-1]
        e1 = Ja - JaPred

        JbPred = np.matmul(backwardProj, Wa)
        JbPred /= JbPred[-1]
        e2 = Jb - JbPred

        errorA[i,:] = e1
        errorB[i,:] = e2

    residual = np.vstack((errorA,errorB))
    return residual.flatten()

def genEulerZXZ(psi, theta, sigma):
    c1 = cos(psi)
    s1 = sin(psi)
    c2 = cos(theta)
    s2 = sin(theta)
    c3 = cos(sigma)
    s3 = sin(sigma)

    mat = np.zeros((3,3))

    mat[0,0] = (c1 * c3) - (s1 * c2 * s3)
    mat[0,1] = (-c1 * s3) - (s1 * c2 * c3)
    mat[0,2] = (s1 * s2)

    mat[1,0] = (s1 * c3) + (c1 * c2 * s3)
    mat[1,1] = (-s1 * s3) + (c1 * c2 * c3)
    mat[1,2] = (-c1 * s2)

    mat[2,0] = (s2 * s3)
    mat[2,1] = (s2 * c3)
    mat[2,2] = c2

    return mat

此优化过程需要0.2到0.4秒，这太多了。也许您知道如何加快此过程？也许还有另一种方法可以找到两个点集的相对旋转和平移？对于rpoleski：

       96    0.023    0.000   19.406    0.202 /usr/local/lib/python3.7/dist-packages/scipy/optimize/_lsq/least_squares.py:240(least_squares)
     4548    0.132    0.000   18.986    0.004 /usr/local/lib/python3.7/dist-packages/scipy/optimize/_lsq/least_squares.py:801(fun_wrapped)
       96    0.012    0.000   18.797    0.196 /usr/local/lib/python3.7/dist-packages/scipy/optimize/_lsq/least_squares.py:42(call_minpack)
     4548   11.102    0.002   18.789    0.004 /home/pi/helperFunctions.py:29(minimizeReprojection)

Answer 1

[很可能在scipy.optimize.least_squares()期间，大部分时间用于计算残差，在您的情况下，这些残差采用scipy.optimize.least_squares()中代码的形式。

但是，您在minReproj()中提供的代码似乎具有次优的内存管理，可以通过预先分配来大大改善该管理。这将显着提高速度。

例如，minReproj()和vstack()由于必须将其参数复制到其最终结果的内存中而遭受重大损失。考虑一下hstack()的前几行，我将它们重新组合在minReproj()中。这些可以用更好的时序重写为gen_affine_OP()（我也重写了gen_affine()以不分配新的内存）：

gen_euler_zxz()

类似地，可以通过定义较大的残差并为其提供import numpy as np from math import sin, cos def gen_euler_zxz(result, psi, theta, sigma): c1 = cos(psi) s1 = sin(psi) c2 = cos(theta) s2 = sin(theta) c3 = cos(sigma) s3 = sin(sigma) result[0,0] = (c1 * c3) - (s1 * c2 * s3) result[0,1] = (-c1 * s3) - (s1 * c2 * c3) result[0,2] = (s1 * s2) result[1,0] = (s1 * c3) + (c1 * c2 * s3) result[1,1] = (-s1 * s3) + (c1 * c2 * c3) result[1,2] = (-c1 * s2) result[2,0] = (s2 * s3) result[2,1] = (s2 * c3) result[2,2] = c2 return result def gen_affine(dof): result = np.zeros((4, 4), dtype=np.float) gen_euler_zxz(result[:3, :3], dof[0], dof[1], dof[2]) result[:3, 3] = dof[3:] result[3, 3] = 1 return result def gen_affine_OP(dof): Rmat = gen_euler_zxz(np.empty((3, 3)), dof[0], dof[1], dof[2]) translationArray = np.array([[dof[3]], [dof[4]], [dof[5]]]) temp = np.hstack((Rmat, translationArray)) return np.vstack((temp, [0, 0, 0, 1])) dof = 1, 2, 3, 4, 5, 6 %timeit gen_affine_OP(dof) # 100000 loops, best of 3: 16.6 µs per loop %timeit gen_affine(dof) # 100000 loops, best of 3: 5.02 µs per loop和residual = np.vstack((errorA,errorB))的视图来避免errorA调用。

另一个示例是在创建errorB，Ja，Jb，Wa时：

Wb

此外，def func_OP(numPoints): for i in range(numPoints): Ja = np.ones((3)) Jb = np.ones((3)) Wa = np.ones((4)) Wb = np.ones((4)) def func(n): Ja = np.empty(3) Jb = np.empty(3) Wa = np.empty(3) Wb = np.empty(3) for i in range(n): Ja[:] = 1 Jb[:] = 1 Wa[:] = 1 Wb[:] = 1 %timeit func_OP(1000) # 100 loops, best of 3: 9.31 ms per loop %timeit func(1000) # 100 loops, best of 3: 2.2 ms per loop正在制作您不需要的副本，而.flatten()只会返回一个视图，但这足以满足您的需求，并且显示速度更快：

.ravel()

最后的评论涉及主循环的加速。我没有为此设计简单的向量化重写，但是您可以使用Numba加快处理速度（只要它以非对象模式编译即可）。

最终，修改后的a = np.ones((300, 300)) %timeit a.ravel() # 1000000 loops, best of 3: 215 ns per loop %timeit a.flatten() # 10000 loops, best of 3: 113 µs per loop看起来像是：

minReproj()

请仔细检查它是否产生与您的代码相同的结果。

Least_square scipy加速

问题描述投票：0回答：1

1个回答

最新问题

Least_square scipy加速

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1