使用带有Tensorflow 2.0的Scipy优化器进行神经网络训练

问题描述 投票:0回答:1

引入Tensorflow 2.0后,scipy接口(tf.contrib.opt.ScipyOptimizerInterface)已被删除。但是,我仍然想使用scipy优化器scipy.optimize.minimize(method ='L-BFGS-B')来训练神经网络(keras模型顺序)。为了使优化器正常工作,它需要输入函数[[fun(x0),其中x0是形状(n,)的数组。因此,第一步将是“加权”权重矩阵以获得具有所需形状的向量。为此,我修改了https://pychao.com/2019/11/02/optimize-tensorflow-keras-models-with-l-bfgs-from-tensorflow-probability/提供的代码。这提供了旨在创建此类函数fun(x0)的函数工厂。但是,该代码似乎不起作用,并且损失函数不会减少。如果有人可以帮助我解决这个问题,我将非常感激。

这里是我正在使用的一段代码:

func = function_factory(model, loss_function, x_u_train, u_train) # convert initial model parameters to a 1D tf.Tensor init_params = tf.dynamic_stitch(func.idx, model.trainable_variables) init_params = tf.cast(init_params, dtype=tf.float32) # train the model with L-BFGS solver results = scipy.optimize.minimize(fun=func, x0=init_params, method='L-BFGS-B') def loss_function(x_u_train, u_train, network): u_pred = tf.cast(network(x_u_train), dtype=tf.float32) loss_value = tf.reduce_mean(tf.square(u_train - u_pred)) return tf.cast(loss_value, dtype=tf.float32) def function_factory(model, loss_f, x_u_train, u_train): """A factory to create a function required by tfp.optimizer.lbfgs_minimize. Args: model [in]: an instance of `tf.keras.Model` or its subclasses. loss [in]: a function with signature loss_value = loss(pred_y, true_y). train_x [in]: the input part of training data. train_y [in]: the output part of training data. Returns: A function that has a signature of: loss_value, gradients = f(model_parameters). """ # obtain the shapes of all trainable parameters in the model shapes = tf.shape_n(model.trainable_variables) n_tensors = len(shapes) # we'll use tf.dynamic_stitch and tf.dynamic_partition later, so we need to # prepare required information first count = 0 idx = [] # stitch indices part = [] # partition indices for i, shape in enumerate(shapes): n = np.product(shape) idx.append(tf.reshape(tf.range(count, count+n, dtype=tf.int32), shape)) part.extend([i]*n) count += n part = tf.constant(part) def assign_new_model_parameters(params_1d): """A function updating the model's parameters with a 1D tf.Tensor. Args: params_1d [in]: a 1D tf.Tensor representing the model's trainable parameters. """ params = tf.dynamic_partition(params_1d, part, n_tensors) for i, (shape, param) in enumerate(zip(shapes, params)): model.trainable_variables[i].assign(tf.cast(tf.reshape(param, shape), dtype=tf.float32)) # now create a function that will be returned by this factory def f(params_1d): """ This function is created by function_factory. Args: params_1d [in]: a 1D tf.Tensor. Returns: A scalar loss. """ # update the parameters in the model assign_new_model_parameters(params_1d) # calculate the loss loss_value = loss_f(x_u_train, u_train, model) # print out iteration & loss f.iter.assign_add(1) tf.print("Iter:", f.iter, "loss:", loss_value) return loss_value # store these information as members so we can use them outside the scope f.iter = tf.Variable(0) f.idx = idx f.part = part f.shapes = shapes f.assign_new_model_parameters = assign_new_model_parameters return f

这里

模型

是对象tf.keras.Sequential。提前感谢您的帮助!
python tensorflow keras neural-network scipy-optimize
1个回答
0
投票
我猜SciPy不知道如何计算TensorFlow对象的梯度。尝试使用原始函数工厂(即,在丢失后也会返回梯度),然后在jac=True中设置scipy.optimize.minimize

我测试了原始Gist的python代码,并用SciPy优化器替换了tfp.optimizer.lbfgs_minimize。它与BFGS方法一起使用:

results = scipy.optimize.minimize(fun=func, x0=init_params, jac=True, method='BFGS')

[jac=True让SciPy知道func也返回渐变。

但是对于L-BFGS-B,它不起作用。我正在研究SciPy的实现的源代码。 Herehere看起来像L-BFGS-B的实现没有考虑func返回损耗和梯度的情况。因此,我要说这是SciPy的问题,因为文档明确指出,当jac=True时,优化器应同时接受func的损耗和梯度。

因此,如果您仍然想使用L-BFGS-B,一种解决方法是也从函数工厂返回一个函数(例如fun_g)。 func_gfunc一样,但返回梯度。然后在SciPy优化器中设置jac=func_g

results = scipy.optimize.minimize(fun=func, x0=init_params, jac=func_g, method='L-BFGS-G')

[jac=func_g告诉SciPy使用func_g来计算梯度。
© www.soinside.com 2019 - 2024. All rights reserved.