我正在尝试为 10000 集的多代理 DDPG 网络 (MADDPG) 训练演员评论家网络,每集有 25 个时间步长。当我开始训练时,在十集之后,我在计算梯度时遇到了这个错误。
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 100]], which is output 0 of AsStridedBackward0, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
我计算梯度和更新模型的代码如下。
for agent_idx, agent in enumerate(self.agents):
# torch.autograd.set_detect_anomaly(True)
critic_value_ = agent.target_critic.forward(states_, new_actions).flatten()
critic_value_[dones[:, 0]] = 0.0
critic_value = agent.critic.forward(states, old_actions).flatten()
target = rewards[:, agent_idx] + (agent.gamma * critic_value_)
critic_loss = F.mse_loss(target, critic_value)
agent.critic.optimizer.zero_grad()
critic_loss.backward(retain_graph= True)
agent.critic.optimizer.step()
actor_loss = agent.critic.forward(states, mu).flatten()
actor_loss = -torch.mean(actor_loss)
agent.actor.optimizer.zero_grad()
actor_loss.backward(retain_graph= True)
agent.actor.optimizer.step()
agent.update_network_parameters()
我使用的是 PyTorch 版本 1.13.1+cu116。我该如何解决这个问题?