我想做什么: 我想连接不同模型的任何图层来创建一个新的keras模型。
到目前为止我发现了什么: https://github.com/keras-team/keras/issues/4205:使用Model的调用类来更改另一个模型的输入。我对这种方法的问题:
https://github.com/keras-team/keras/issues/3465:使用基本模型的任何输出向基础模型添加新图层。问题在这里:
我尝试过的: 我连接不同模型的任何图层的方法:
起初我真的很乐观,因为summary()
和plot_model()
正好得到了我想要的东西,因此节点图应该没问题吧?但是我在训练时遇到了错误。虽然“我到目前为止所发现的”部分的方法训练得很好,但我的方法遇到了错误。这是错误消息:
File "C:\Anaconda\envs\dlpipe\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 508, in apply_op
(input_name, err))
ValueError: Tried to convert 'x' to a tensor and failed. Error: None values not supported.
可能是一个重要的信息,我使用Tensorflow作为后端。我能够追溯到这个错误的根源。计算梯度时似乎存在错误。通常,每个节点都有一个梯度计算,但在使用我的方法时,基础网络的所有节点都有“无”。所以基本上在keras / optimizers.py,get_updates()
时计算梯度(grad = self.get_gradients(loss, params)
)。
这是代码(没有培训),实现了所有三种方法:
def create_base():
in_layer = Input(shape=(32, 32, 3), name="base_input")
x = Conv2D(32, (3, 3), padding='same', activation="relu", name="base_conv2d_1")(in_layer)
x = Conv2D(32, (3, 3), padding='same', activation="relu", name="base_conv2d_2")(x)
x = MaxPooling2D(pool_size=(2, 2), name="base_maxpooling_2d_1")(x)
x = Dropout(0.25, name="base_dropout")(x)
x = Conv2D(64, (3, 3), padding='same', activation="relu", name="base_conv2d_3")(x)
x = Conv2D(64, (3, 3), padding='same', activation="relu", name="base_conv2d_4")(x)
x = MaxPooling2D(pool_size=(2, 2), name="base_maxpooling2d_2")(x)
x = Dropout(0.25, name="base_dropout_2")(x)
return Model(inputs=in_layer, outputs=x, name="base_model")
def create_encoder():
in_layer = Input(shape=(8, 8, 64))
x = Flatten(name="encoder_flatten")(in_layer)
x = Dense(512, activation="relu", name="encoder_dense_1")(x)
x = Dropout(0.5, name="encoder_dropout_2")(x)
x = Dense(10, activation="softmax", name="encoder_dense_2")(x)
return Model(inputs=in_layer, outputs=x, name="encoder_model")
def extend_base(input_model):
x = Flatten(name="custom_flatten")(input_model.output)
x = Dense(512, activation="relu", name="custom_dense_1")(x)
x = Dropout(0.5, name="custom_dropout_2")(x)
x = Dense(10, activation="softmax", name="custom_dense_2")(x)
return Model(inputs=input_model.input, outputs=x, name="custom_edit")
def connect_layers(from_tensor, to_layer, clear_inbound_nodes=True):
try:
tmp_output = to_layer.output
except AttributeError:
raise ValueError("Connecting to shared layers is not supported!")
if clear_inbound_nodes:
to_layer.inbound_nodes = []
else:
tensor_list = to_layer.inbound_nodes[0].input_tensors
tensor_list.append(from_tensor)
from_tensor = tensor_list
to_layer.inbound_nodes = []
new_output = to_layer(from_tensor)
for out_node in to_layer.outbound_nodes:
for i, in_tensor in enumerate(out_node.input_tensors):
if in_tensor == tmp_output:
out_node.input_tensors[i] = new_output
if __name__ == "__main__":
base = create_base()
encoder = create_encoder()
#new_model_1 = Model(inputs=base.input, outputs=encoder(base.output))
#plot_model(new_model_1, to_file="plots/new_model_1.png")
new_model_2 = extend_base(base)
plot_model(new_model_2, to_file="plots/new_model_2.png")
print(new_model_2.summary())
base_layer = base.get_layer("base_dropout_2")
top_layer = encoder.get_layer("encoder_flatten")
connect_layers(base_layer.output, top_layer)
new_model_3 = Model(inputs=base.input, outputs=encoder.output)
plot_model(new_model_3, to_file="plots/new_model_3.png")
print(new_model_3.summary())
我知道这是很多文本和很多代码。但我觉得这里需要解释这个问题。
编辑:我刚试过thenao,我认为错误提供了更多信息:
theano.gradient.DisconnectedInputError:
Backtrace when that variable is created:
似乎编码器模型中的每个层都通过TensorVariables与编码器输入层建立了一些连接。
所以这就是我为connect_layer()
函数最终得到的:
def connect_layers(from_tensor, to_layer, old_tensor=None):
# if there is any shared layer after the to_layer, it is not supported
try:
tmp_output = to_layer.output
except AttributeError:
raise ValueError("Connecting to shared layers is not supported!")
# check if to_layer has multiple input_tensors, and therefore some sort of merge layer
if len(to_layer.inbound_nodes[0].input_tensors) > 1:
tensor_list = to_layer.inbound_nodes[0].input_tensors
found_tensor = False
for i, tensor in enumerate(tensor_list):
# exchange the old tensor with the new created tensor
if tensor == old_tensor:
tensor_list[i] = from_tensor
found_tensor = True
break
if not found_tensor:
tensor_list.append(from_tensor)
from_tensor = tensor_list
to_layer.inbound_nodes = []
else:
to_layer.inbound_nodes = []
new_output = to_layer(from_tensor)
tmp_out_nodes = to_layer.outbound_nodes[:]
to_layer.outbound_nodes = []
# recursively connect all layers after the current to_layer
for out_node in tmp_out_nodes:
l = out_node.outbound_layer
print("Connecting: " + str(to_layer) + " ----> " + str(l))
connect_layers(new_output, l, tmp_output)
由于每个Tensor都有关于它的根张量的所有信息,通过 - > owner.inputs - > owner.inputs - > ...,必须更新跟随new_output
张量的所有张量。
使用theano然后使用tensorflow后端调试它要容易得多。
我仍然需要弄清楚如何处理共享层。使用当前实现,在第一个to_layer
之后无法连接包含共享层的其他模型。