我的DDQN网络是否正确实施?

问题描述 投票:0回答:1

这是我的重播/训练功能实现。我制作了DDQN,以便在重播/训练期间modelmodel2落后1批大小。通过设置self.ddqn = False,它成为普通的DQN。是否正确实施?我将本文用作参考:

http://papers.nips.cc/paper/3964-double-q-learning.pdf

DDQN代码

    def replay(self, batch_size):
        if self.ddqn:
            self.model2.load_state_dict(self.model.state_dict()) # copies model weights to model2
        minibatch = random.sample(self.memory, batch_size)
        for state, action, reward, next_state, done in minibatch:
            state = torch.Tensor(state)
            next_state = torch.Tensor(next_state)
            if self.cuda:
                state = torch.Tensor(state).cuda()
                next_state = torch.Tensor(next_state).cuda()
            Q_current = self.model(state)
            Q_target = Q_current.clone() # TODO: test copy.deepcopy() and Tensor.copy_()
            Q_next = (1-done)*self.model(next_state).cpu().detach().numpy()
            next_action = np.argmax(Q_next)
            if self.ddqn:
                Q_next = (1-done)*self.model2(next_state).cpu().detach().numpy()
            Q_target[action] = Q_current[action] + self.alpha*(reward + self.gamma*Q_next[next_action] - Q_current[action])

            self.optim.zero_grad()
            loss = self.loss(Q_current, Q_target)
            loss.backward()
            self.optim.step()

        if self.epsilon > self.epsilon_min:
            self.epsilon = max(self.epsilon*self.epsilon_decay, self.epsilon_min)
machine-learning pytorch reinforcement-learning dqn
1个回答
0
投票

我建议将next_action行移到下面并使用if-else:

if self.ddqn:
    Q_next = (1-done)*self.model2(next_state).cpu().detach().numpy()
else:
    Q_next = (1-done)*self.model(next_state).cpu().detach().numpy()
next_action = np.argmax(Q_next)

其余看起来不错。

© www.soinside.com 2019 - 2024. All rights reserved.