Pytorch LSTM - Q&A分类培训

问题描述 投票:0回答:1

我正在尝试训练模型来分类,如果答案回答了使用这个dataset给出的问题。

我正在分批训练并使用GloVe字嵌入。除了最后一个,我分批训练1000个。我试图使用的方法是首先给出第一个句子(问题),然后给LSTM第二个句子(答案)并让它通过使用sigmoid函数给出0到1之间的数字。

问题是,损失总是在时代1之后重复。它永远不会收敛到正确的结果,如果答案属于问题,则为1,0,否则。

我的代码如下:

class QandA(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(QandA, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = 1
        self.bidirectional = True

        self.lstm = nn.LSTM(input_size, self.hidden_size, num_layers = self.num_layers, bidirectional = self.bidirectional)
        self.lstm.to(device)
        self.hidden2class = nn.Linear(self.hidden_size * 2, 1)
        self.hidden2class.to(device)

    def forward(self, glove_vec, glove_vec2):
        # glove_vec.shape = (sentence_len, batch_size, 300)
        output, hidden = self.lstm(glove_vec)
        output, _ = self.lstm(glove_vec2, hidden)
        # output.shape = (sentence_len, batch_size, hidden_size * 2)
        output = self.hidden2class(output[-1,:,:])
        # output.shape = (batch_size, 1)
        return F.sigmoid(output)
model = QandA(300, 60).to(device)
loss_function = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.1)

我的方法是错误的,它在实践中不起作用吗?或者我还在监督其他问题吗?

编辑:有关培训的额外代码;

batch_size = 1000
# load_dataset loads the data from the file.
questions, answers, outputs = load_dataset()
N = len(outputs)
losses = []
for epoch in range(10):
    for batch in range(math.ceil(N / batch_size)):
        model.zero_grad()

        # get_data gets the data from the dataset (size batch_size, sequence batch)
        input1, input2, targets = get_data(batch, batch_size)

        class_pred = model(input1, input2)
        loss = loss_function(class_pred, targets)
        loss.backward()
        optimizer.step()
lstm pytorch
1个回答
0
投票

我建议独立编码问题和答案,并在其上面放置一个分类器。例如,您可以使用biLSTM问题和答案进行编码,将它们的表示连接起来并提供给分类器。代码可能是这样的(没有测试,但希望你有这个想法):

class QandA(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(QandA, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = 1
        self.bidirectional = True

        self.lstm_question = nn.LSTM(input_size, self.hidden_size, num_layers = self.num_layers, bidirectional = self.bidirectional)
        self.lstm_question.to(device)
        self.lstm_answer = nn.LSTM(input_size, self.hidden_size, num_layers = self.num_layers, bidirectional = self.bidirectional)
        self.lstm_answer.to(device)
        self.fc = nn.Linear(self.hidden_size * 4, 1)
        self.fc.to(device)

    def forward(self, glove_question, glove_answer):
        # glove.shape = (sentence_len, batch_size, 300)
        question_last_hidden, _ = self.lstm_question(glove_question)
        # question_last_hidden.shape = (question_len, batch_size, hidden_size * 2)
        answer_last_hidden, _ = self.lstm_answer(glove_answer)
        # answer_last_hidden.shape = (answer_len, batch_size, hidden_size * 2)

        # flatten output of the lstm, if you have multiple lstm layers you need to take only the last layers backward/forward hidden states
        question_last_hidden = question_last_hidden[-1,:,:]
        answer_last_hidden = answer_last_hidden[-1,:,:]
        representation = torch.cat([question_last_hidden, answer_last_hidden], -1) # check here to concatenate over feature size
        # representation.shape = (hidden_size * 4, batch_size)
        output = self.fc(representation)
        # output.shape = (batch_size, 1)
        return F.sigmoid(output)
© www.soinside.com 2019 - 2024. All rights reserved.