我正在尝试训练模型来分类,如果答案回答了使用这个dataset给出的问题。
我正在分批训练并使用GloVe字嵌入。除了最后一个,我分批训练1000个。我试图使用的方法是首先给出第一个句子(问题),然后给LSTM第二个句子(答案)并让它通过使用sigmoid函数给出0到1之间的数字。
问题是,损失总是在时代1之后重复。它永远不会收敛到正确的结果,如果答案属于问题,则为1,0,否则。
我的代码如下:
class QandA(nn.Module):
def __init__(self, input_size, hidden_size):
super(QandA, self).__init__()
self.hidden_size = hidden_size
self.num_layers = 1
self.bidirectional = True
self.lstm = nn.LSTM(input_size, self.hidden_size, num_layers = self.num_layers, bidirectional = self.bidirectional)
self.lstm.to(device)
self.hidden2class = nn.Linear(self.hidden_size * 2, 1)
self.hidden2class.to(device)
def forward(self, glove_vec, glove_vec2):
# glove_vec.shape = (sentence_len, batch_size, 300)
output, hidden = self.lstm(glove_vec)
output, _ = self.lstm(glove_vec2, hidden)
# output.shape = (sentence_len, batch_size, hidden_size * 2)
output = self.hidden2class(output[-1,:,:])
# output.shape = (batch_size, 1)
return F.sigmoid(output)
model = QandA(300, 60).to(device)
loss_function = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.1)
我的方法是错误的,它在实践中不起作用吗?或者我还在监督其他问题吗?
编辑:有关培训的额外代码;
batch_size = 1000
# load_dataset loads the data from the file.
questions, answers, outputs = load_dataset()
N = len(outputs)
losses = []
for epoch in range(10):
for batch in range(math.ceil(N / batch_size)):
model.zero_grad()
# get_data gets the data from the dataset (size batch_size, sequence batch)
input1, input2, targets = get_data(batch, batch_size)
class_pred = model(input1, input2)
loss = loss_function(class_pred, targets)
loss.backward()
optimizer.step()
我建议独立编码问题和答案,并在其上面放置一个分类器。例如,您可以使用biLSTM问题和答案进行编码,将它们的表示连接起来并提供给分类器。代码可能是这样的(没有测试,但希望你有这个想法):
class QandA(nn.Module):
def __init__(self, input_size, hidden_size):
super(QandA, self).__init__()
self.hidden_size = hidden_size
self.num_layers = 1
self.bidirectional = True
self.lstm_question = nn.LSTM(input_size, self.hidden_size, num_layers = self.num_layers, bidirectional = self.bidirectional)
self.lstm_question.to(device)
self.lstm_answer = nn.LSTM(input_size, self.hidden_size, num_layers = self.num_layers, bidirectional = self.bidirectional)
self.lstm_answer.to(device)
self.fc = nn.Linear(self.hidden_size * 4, 1)
self.fc.to(device)
def forward(self, glove_question, glove_answer):
# glove.shape = (sentence_len, batch_size, 300)
question_last_hidden, _ = self.lstm_question(glove_question)
# question_last_hidden.shape = (question_len, batch_size, hidden_size * 2)
answer_last_hidden, _ = self.lstm_answer(glove_answer)
# answer_last_hidden.shape = (answer_len, batch_size, hidden_size * 2)
# flatten output of the lstm, if you have multiple lstm layers you need to take only the last layers backward/forward hidden states
question_last_hidden = question_last_hidden[-1,:,:]
answer_last_hidden = answer_last_hidden[-1,:,:]
representation = torch.cat([question_last_hidden, answer_last_hidden], -1) # check here to concatenate over feature size
# representation.shape = (hidden_size * 4, batch_size)
output = self.fc(representation)
# output.shape = (batch_size, 1)
return F.sigmoid(output)