引入Tensorflow 2.0后,scipy接口(tf.contrib.opt.ScipyOptimizerInterface)已被删除。但是,我仍然想使用scipy优化器scipy.optimize.minimize(method ='L-BFGS-B')来训练神经网络(keras模型顺序)。为了使优化器正常工作,它需要输入函数[[fun(x0),其中x0是形状(n,)的数组。因此,第一步将是“加权”权重矩阵以获得具有所需形状的向量。为此,我修改了https://pychao.com/2019/11/02/optimize-tensorflow-keras-models-with-l-bfgs-from-tensorflow-probability/提供的代码。这提供了旨在创建此类函数fun(x0)的函数工厂。但是,该代码似乎不起作用,并且损失函数不会减少。如果有人可以帮助我解决这个问题,我将非常感激。
这里是我正在使用的一段代码:func = function_factory(model, loss_function, x_u_train, u_train)
# convert initial model parameters to a 1D tf.Tensor
init_params = tf.dynamic_stitch(func.idx, model.trainable_variables)
init_params = tf.cast(init_params, dtype=tf.float32)
# train the model with L-BFGS solver
results = scipy.optimize.minimize(fun=func, x0=init_params, method='L-BFGS-B')
def loss_function(x_u_train, u_train, network):
u_pred = tf.cast(network(x_u_train), dtype=tf.float32)
loss_value = tf.reduce_mean(tf.square(u_train - u_pred))
return tf.cast(loss_value, dtype=tf.float32)
def function_factory(model, loss_f, x_u_train, u_train):
"""A factory to create a function required by tfp.optimizer.lbfgs_minimize.
Args:
model [in]: an instance of `tf.keras.Model` or its subclasses.
loss [in]: a function with signature loss_value = loss(pred_y, true_y).
train_x [in]: the input part of training data.
train_y [in]: the output part of training data.
Returns:
A function that has a signature of:
loss_value, gradients = f(model_parameters).
"""
# obtain the shapes of all trainable parameters in the model
shapes = tf.shape_n(model.trainable_variables)
n_tensors = len(shapes)
# we'll use tf.dynamic_stitch and tf.dynamic_partition later, so we need to
# prepare required information first
count = 0
idx = [] # stitch indices
part = [] # partition indices
for i, shape in enumerate(shapes):
n = np.product(shape)
idx.append(tf.reshape(tf.range(count, count+n, dtype=tf.int32), shape))
part.extend([i]*n)
count += n
part = tf.constant(part)
def assign_new_model_parameters(params_1d):
"""A function updating the model's parameters with a 1D tf.Tensor.
Args:
params_1d [in]: a 1D tf.Tensor representing the model's trainable parameters.
"""
params = tf.dynamic_partition(params_1d, part, n_tensors)
for i, (shape, param) in enumerate(zip(shapes, params)):
model.trainable_variables[i].assign(tf.cast(tf.reshape(param, shape), dtype=tf.float32))
# now create a function that will be returned by this factory
def f(params_1d):
"""
This function is created by function_factory.
Args:
params_1d [in]: a 1D tf.Tensor.
Returns:
A scalar loss.
"""
# update the parameters in the model
assign_new_model_parameters(params_1d)
# calculate the loss
loss_value = loss_f(x_u_train, u_train, model)
# print out iteration & loss
f.iter.assign_add(1)
tf.print("Iter:", f.iter, "loss:", loss_value)
return loss_value
# store these information as members so we can use them outside the scope
f.iter = tf.Variable(0)
f.idx = idx
f.part = part
f.shapes = shapes
f.assign_new_model_parameters = assign_new_model_parameters
return f
这里是对象tf.keras.Sequential。提前感谢您的帮助!模型
jac=True
中设置scipy.optimize.minimize
。我测试了原始Gist的python代码,并用SciPy优化器替换了tfp.optimizer.lbfgs_minimize
。它与BFGS
方法一起使用:
results = scipy.optimize.minimize(fun=func, x0=init_params, jac=True, method='BFGS')
[jac=True
让SciPy知道func
也返回渐变。但是对于
L-BFGS-B
,它不起作用。我正在研究SciPy的实现的源代码。 Here和here看起来像L-BFGS-B的实现没有考虑func
返回损耗和梯度的情况。因此,我要说这是SciPy的问题,因为文档明确指出,当jac=True
时,优化器应同时接受func
的损耗和梯度。因此,如果您仍然想使用
L-BFGS-B
,一种解决方法是也从函数工厂返回一个函数(例如fun_g
)。func_g
与func
一样,但返回梯度。然后在SciPy优化器中设置jac=func_g
:
results = scipy.optimize.minimize(fun=func, x0=init_params, jac=func_g, method='L-BFGS-G')
[jac=func_g
告诉SciPy使用func_g
来计算梯度。