我想训练一个暹罗-LSTM,以便如果相应的标签为0,两个输出的角距离为1(低相似度,如果标签为1,则为0(高相似度)。
我从这里开始求角距的公式:https://en.wikipedia.org/wiki/Cosine_similarity
这是我的模型代码:
# inputs are unicode encoded int arrays from strings
# similar string should yield low angular distance
left_input = tf.keras.layers.Input(shape=[None, 1], dtype='float32')
right_input = tf.keras.layers.Input(shape=[None, 1], dtype='float32')
lstm = tf.keras.layers.LSTM(10)
left_embedding = lstm(left_input)
right_embedding = lstm(right_input)
# cosine_layer is the operation to get cosine similarity
cosine_layer = tf.keras.layers.Dot(axes=1, normalize=True)
cosine_similarity = cosine_layer([left_embedding, right_embedding])
# next two lines calculate angular distance but with inversed labels
arccos = tf.math.acos(cosine_similarity)
angular_distance = arccos / math.pi # not 1. - (arccos / math.pi)
model = tf.keras.Model([left_input, right_input], [angular_distance])
model.compile(loss='binary_crossentropy', optimizer='sgd')
print(model.summary())
模型摘要在我看来还不错,当用固定输入值进行测试时,我的余弦相似度等也获得了正确的值:
Model: "model_37"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_95 (InputLayer) [(None, None, 1)] 0
__________________________________________________________________________________________________
input_96 (InputLayer) [(None, None, 1)] 0
__________________________________________________________________________________________________
lstm_47 (LSTM) (None, 10) 480 input_95[0][0]
input_96[0][0]
__________________________________________________________________________________________________
dot_47 (Dot) (None, 1) 0 lstm_47[0][0]
lstm_47[1][0]
__________________________________________________________________________________________________
tf_op_layer_Acos_52 (TensorFlow [(None, 1)] 0 dot_47[0][0]
__________________________________________________________________________________________________
tf_op_layer_truediv_37 (TensorF [(None, 1)] 0 tf_op_layer_Acos_52[0][0]
__________________________________________________________________________________________________
tf_op_layer_sub_20 (TensorFlowO [(None, 1)] 0 tf_op_layer_truediv_37[0][0]
__________________________________________________________________________________________________
tf_op_layer_sub_21 (TensorFlowO [(None, 1)] 0 tf_op_layer_sub_20[0][0]
__________________________________________________________________________________________________
tf_op_layer_Abs (TensorFlowOpLa [(None, 1)] 0 tf_op_layer_sub_21[0][0]
==================================================================================================
Total params: 480
Trainable params: 480
Non-trainable params: 0
__________________________________________________________________________________________________
None
但是经过训练我总是会失去NaN
model.fit([np.array(x_left_train), np.array(x_right_train)], np.array(y_train).reshape((-1,1)), batch_size=1, epochs=2, validation_split=0.1)
Train on 14400 samples, validate on 1600 samples
Epoch 1/2
673/14400 [>.............................] - ETA: 5:42 - loss: nan
这不是获得两个向量之间相似度并训练我的网络生成这些向量的正确方法吗?
二元交叉熵计算log(output)
和log(1-output)
。这意味着您的输出必须严格大于0且严格小于1,否则您将计算负数的log
,结果为NaN
。 (注意:log(0)
应该给您-inf
,虽然不如NaN
差,但仍不理想)
从数学上讲,您的输出应该在正确的时间间隔内,但是由于浮点运算的不准确性,我可以很好地想象这是您的问题。但是,这只是一个猜测。
因此,请尝试将输出强制为大于0且小于1,例如通过将clip
与小ε结合使用:
clip