我是ML新手,我一直想用python实现一个神经网络,但是当我使用scipy库中的tnc方法的最小化函数时,我得到以下错误。
ValueError: tnc: invalid gradient vector.
我查了一下,在源码中发现了这样的内容
arr_grad = (PyArrayObject *)PyArray_FROM_OTF((PyObject *)py_grad, NPY_DOUBLE, NPY_ARRAY_IN_ARRAY);
if (arr_grad == NULL)
{
PyErr_SetString(PyExc_ValueError, "tnc: invalid gradient vector.");
goto failure;
编辑:这是我把反向传播和成本函数作为我创建的Network类的方法的实现,我目前使用的是类似于Andrew Ng的ML Coursea课程中使用的[400 25 10]结构。
def cost_function(self, theta, x, y):
u = self.num_layers
m = len(x)
Reg = 0 # Regulaization Term init and Calculation
for i in range(u - 1):
k = np.power(theta[i], 2)
Reg = np.sum(Reg + np.sum(k))
Reg = lmbda / (2 * m) * Reg
h = self.forwardprop(x)[-1] # Getting the activation of the last layer
J = (-1 / m) * np.sum(np.multiply(y, np.log(h)) + np.multiply((1 - y), np.log(1 - h))) + Reg # Cost Func
return J
def backprop(self, theta, x, y):
m = len(x) # number of training example
theta = np.asmatrix(theta) #
theta = self.rollPara(theta) # Roll weights into Matrices, Original shape (1, 10285), after rolling [(25, 401), (26, 10)]
tot_delta = list(range((self.num_layers-1))) # accumulated error init
delta =list(range(self.num_layers-1)) # error from each example init
for i in range(m): # loop for calculating error
a = self.forwardprop(x[i:i+1, :]) # get activation of each layer for ith example
delta[-1] = a[-1] - y[i] # error of output layer of ith example
for j in range(1, self.num_layers-1): # loop to calculate error of each layer for ith example
theta_ = theta[-1-j+1][:, 1:] # weights of jth layer (from back to front)('-1' represents last element)(1. weights index 2.exclude bias units)
act = (a[:-1])[-1-j+1][:, 1:] # activation of current layer (1. exclude output layer layer 2. activation index 3. exclude bias units)
delta_prv = delta[-1-j+1] # error of previous layer
delta[-1-j] = np.multiply(delta_prv@theta_, act) # error of current layer
delta = delta[::-1] # reverse the order of elements since BP starts from back to front
for j in range(self.num_layers-1): # loop to add ith example error to accumlated error
tot_delta[j] = tot_delta[j] + np.transpose(delta[j])@a[self.num_layers-2-j] # add jth layer error from ith example to jth layer accumulated error
ThetaGrad = np.add((1/m)*np.asarray(tot_delta[::-1]), (lmbda/m)*np.asarray(theta)) # calculate gradient
grad = self.unrollPara(ThetaGrad)
return grad
maxiter=500
options = {'maxiter': maxiter}
initTheta = N.unrollPara(N.weights) # flattening into vector
res = op.minimize(fun=N.cost_function, x0=initTheta, jac=N.backprop, method='tnc', args=(x, Y), options=options) # x, Y are training set that are already initialized
这个 是scipy的源代码
先谢谢你。
仔细阅读代码后,我意识到它的grad vector必须是一个list而不是NumPy数组。不知道我的实现是否正确,但错误已经消失了。