我正在尝试在 TensorFlow 中构建自定义一维卷积层。我已经检查过该层是否做了应该做的事情。然而,当我将其插入顺序 Keras 模型时,我收到警告,自定义层中的变量不存在渐变。
您能解释一下为什么会发生这种情况以及我该如何解决它吗?
这是代码
import tensorflow as tf
import numpy as np
class customC1DLayer(tf.keras.layers.Layer):
def __init__(self, filter_size = 1, activation = None ,**kwargs):
super(customC1DLayer, self).__init__(**kwargs)
self.filter_size = filter_size
self.activation = tf.keras.activations.get(activation)
def build(self, input_shape):
self.filter = self.add_weight('filter', shape=[self.filter_size, ], trainable=True, dtype=tf.float32)
self.padding = tf.Variable(initial_value=tf.zeros(shape=[input_shape[-1] - self.filter_size, ], dtype=tf.float32), trainable=False)
padded_filter = tf.concat([self.filter, self.padding], axis=0)
col = tf.concat([padded_filter[:1], tf.zeros_like(padded_filter[1:])], axis=0)
self.augmented_filter = tf.linalg.LinearOperatorToeplitz(padded_filter, col).to_dense()
def call(self, inputs):
outputs = tf.transpose(tf.matmul(self.augmented_filter, inputs, transpose_b=True))
if self.activation is not None:
outputs = self.activation(outputs)
return outputs
为了解释代码,在方法构建中,我初始化了一些权重,例如 [a b c],然后augmented_filter 只是循环矩阵 [[a b c 0 0], [0 a b c 0], [0 0 a b c]]
我知道使用不可微函数时可能会发生此类错误。然而,在这种情况下,据我所知,我只使用应该可微分的矩阵运算。
问题是,在调用函数中,没有从
augmented_filter
到 padding
的路径——我猜你想要后者的渐变。就目前情况而言,该变量实际上并未被使用,因此无法计算任何梯度。您需要在 call
: 内完成此转换
class customC1DLayer(tf.keras.layers.Layer):
def __init__(self, filter_size = 1, activation = None ,**kwargs):
super(customC1DLayer, self).__init__(**kwargs)
self.filter_size = filter_size
self.activation = tf.keras.activations.get(activation)
def build(self, input_shape):
self.filter = self.add_weight('filter', shape=[self.filter_size, ], trainable=True, dtype=tf.float32)
self.padding = tf.Variable(initial_value=tf.zeros(shape=[input_shape[-1] - self.filter_size, ], dtype=tf.float32), trainable=False)
def call(self, inputs):
padded_filter = tf.concat([self.filter, self.padding], axis=0)
col = tf.concat([padded_filter[:1], tf.zeros_like(padded_filter[1:])], axis=0)
augmented_filter = tf.linalg.LinearOperatorToeplitz(padded_filter, col).to_dense()
outputs = tf.transpose(tf.matmul(augmented_filter, inputs, transpose_b=True))
if self.activation is not None:
outputs = self.activation(outputs)
return outputs