使用带有Tensorflow 2.0的Scipy优化器进行神经网络训练

Question

引入Tensorflow 2.0后，scipy接口（tf.contrib.opt.ScipyOptimizerInterface）已被删除。但是，我仍然想使用scipy优化器scipy.optimize.minimize（method ='L-BFGS-B'）来训练神经网络（keras模型顺序）。为了使优化器正常工作，它需要输入函数[[fun（x0），其中x0是形状（n，）的数组。因此，第一步将是“加权”权重矩阵以获得具有所需形状的向量。为此，我修改了https://pychao.com/2019/11/02/optimize-tensorflow-keras-models-with-l-bfgs-from-tensorflow-probability/提供的代码。这提供了旨在创建此类函数fun（x0）的函数工厂。但是，该代码似乎不起作用，并且损失函数不会减少。如果有人可以帮助我解决这个问题，我将非常感激。

这里是我正在使用的一段代码：
func = function_factory(model, loss_function, x_u_train, u_train) # convert initial model parameters to a 1D tf.Tensor init_params = tf.dynamic_stitch(func.idx, model.trainable_variables) init_params = tf.cast(init_params, dtype=tf.float32) # train the model with L-BFGS solver results = scipy.optimize.minimize(fun=func, x0=init_params, method='L-BFGS-B') def loss_function(x_u_train, u_train, network): u_pred = tf.cast(network(x_u_train), dtype=tf.float32) loss_value = tf.reduce_mean(tf.square(u_train - u_pred)) return tf.cast(loss_value, dtype=tf.float32) def function_factory(model, loss_f, x_u_train, u_train): """A factory to create a function required by tfp.optimizer.lbfgs_minimize. Args: model [in]: an instance of `tf.keras.Model` or its subclasses. loss [in]: a function with signature loss_value = loss(pred_y, true_y). train_x [in]: the input part of training data. train_y [in]: the output part of training data. Returns: A function that has a signature of: loss_value, gradients = f(model_parameters). """ # obtain the shapes of all trainable parameters in the model shapes = tf.shape_n(model.trainable_variables) n_tensors = len(shapes) # we'll use tf.dynamic_stitch and tf.dynamic_partition later, so we need to # prepare required information first count = 0 idx = [] # stitch indices part = [] # partition indices for i, shape in enumerate(shapes): n = np.product(shape) idx.append(tf.reshape(tf.range(count, count+n, dtype=tf.int32), shape)) part.extend([i]*n) count += n part = tf.constant(part) def assign_new_model_parameters(params_1d): """A function updating the model's parameters with a 1D tf.Tensor. Args: params_1d [in]: a 1D tf.Tensor representing the model's trainable parameters. """ params = tf.dynamic_partition(params_1d, part, n_tensors) for i, (shape, param) in enumerate(zip(shapes, params)): model.trainable_variables[i].assign(tf.cast(tf.reshape(param, shape), dtype=tf.float32)) # now create a function that will be returned by this factory def f(params_1d): """ This function is created by function_factory. Args: params_1d [in]: a 1D tf.Tensor. Returns: A scalar loss. """ # update the parameters in the model assign_new_model_parameters(params_1d) # calculate the loss loss_value = loss_f(x_u_train, u_train, model) # print out iteration & loss f.iter.assign_add(1) tf.print("Iter:", f.iter, "loss:", loss_value) return loss_value # store these information as members so we can use them outside the scope f.iter = tf.Variable(0) f.idx = idx f.part = part f.shapes = shapes f.assign_new_model_parameters = assign_new_model_parameters return f
这里
模型
是对象tf.keras.Sequential。提前感谢您的帮助！

Answer 1

我猜SciPy不知道如何计算TensorFlow对象的梯度。尝试使用原始函数工厂（即，在丢失后也会返回梯度），然后在jac=True中设置scipy.optimize.minimize。

我测试了原始Gist的python代码，并用SciPy优化器替换了tfp.optimizer.lbfgs_minimize。它与BFGS方法一起使用：

results = scipy.optimize.minimize(fun=func, x0=init_params, jac=True, method='BFGS')

[jac=True让SciPy知道func也返回渐变。

但是对于L-BFGS-B，它不起作用。我正在研究SciPy的实现的源代码。 Here和here看起来像L-BFGS-B的实现没有考虑func返回损耗和梯度的情况。因此，我要说这是SciPy的问题，因为文档明确指出，当jac=True时，优化器应同时接受func的损耗和梯度。

因此，如果您仍然想使用L-BFGS-B，一种解决方法是也从函数工厂返回一个函数（例如fun_g）。 func_g与func一样，但返回梯度。然后在SciPy优化器中设置jac=func_g：

results = scipy.optimize.minimize(fun=func, x0=init_params, jac=func_g, method='L-BFGS-G')

[jac=func_g告诉SciPy使用func_g来计算梯度。

使用带有Tensorflow 2.0的Scipy优化器进行神经网络训练

问题描述投票：0回答：1

1个回答

最新问题

使用带有Tensorflow 2.0的Scipy优化器进行神经网络训练

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1