我在使用 DDPG 代理计算批评损失函数中的均方误差时遇到问题。我收到的错误消息表明 DDPG 代理的批评损失函数中 td_targets 和 q_values 张量之间的预期张量形状和实际张量形状之间存在形状不匹配。
以下是相关代码片段:
# Create the agent
self.ddpg_agent = DdpgAgent(
time_step_spec=self.tf_env.time_step_spec(),
action_spec=self.tf_env.action_spec(),
actor_network=actor_network,
critic_network=critic_network,
actor_optimizer=Adam(learning_rate=self.learning_rate),
critic_optimizer=Adam(learning_rate=self.learning_rate),
gamma=self.discount_factor,
target_update_tau=0.01,
ou_stddev=0.3,
ou_damping=0.3,
td_errors_loss_fn=common.element_wise_squared_loss,
)
# Initialize replay buffer
replay_buffer = replay_buffers.tf_uniform_replay_buffer.TFUniformReplayBuffer(
data_spec=self.ddpg_agent.collect_data_spec,
batch_size=1,
max_length=5000)
#Add experiences to the replay buffer
experience = trajectory.from_transition(time_step, action_step, next_time_step)
replay_buffer.add_batch(experience)
# Create the dataset
dataset = replay_buffer.as_dataset(
sample_batch_size=self.batch_size, # self.batch_size = 32
num_steps=2,
num_parallel_calls=3,
single_deterministic_pass=False
).prefetch(3)
#Train the agent
iterator = iter(dataset)
experience_set, _ = next(iterator)
loss = self.ddpg_agent.train(experience_set)
如果我运行代码,它会在损失计算过程中被中断并出现错误:
File "main.py", line 138, in <module>
main()
File "main.py", line 109, in main
a2c.train_agent()
File "a2c.py", line 41, in train_agent
self.agent.train_agent()
File "agent.py", line 161, in train_agent
loss = self.ddpg_agent.train(experience_set)
File "tf_agents\agents\tf_agent.py", line 330, in train
loss_info = self._train_fn(
File "tf_agents\utils\common.py", line 188, in with_check_resource_vars
return fn(*fn_args, **fn_kwargs)
File "tf_agents\agents\ddpg\ddpg_agent.py", line 247, in _train
critic_loss = self.critic_loss(time_steps, actions, next_time_steps,
File "tf_agents\agents\ddpg\ddpg_agent.py", line 343, in critic_loss
critic_loss = self._td_errors_loss_fn(td_targets, q_values)
File "tf_agents\utils\common.py", line 1139, in element_wise_squared_loss
return tf.compat.v1.losses.mean_squared_error(
File "tensorflow\python\util\traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "tensorflow\python\framework\tensor_shape.py", line 1361, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (32, 1) and (32, 32) are incompatible
我检查了演员和评论家网络的所有规格形状、体验形状和输出形状。它们看起来都是正确的,演员和评论家输出层产生预期的形状 (32, 1),其中批量大小为 32。 tf_agents gents\ddpg\ddpg_agent.py 中损失函数中的 td_targets 和 q_values 之间不匹配: TD 目标形状: (32, 32) Q 值形状:(32, 1)
有人可以告诉我我在这里缺少什么吗?
我通过在 DDPG 初始化时选择另一个损失函数解决了这个问题:
# Create the agent
self.ddpg_agent = DdpgAgent(
time_step_spec=self.tf_env.time_step_spec(),
action_spec=self.tf_env.action_spec(),
actor_network=actor_network,
critic_network=critic_network,
actor_optimizer=Adam(learning_rate=self.learning_rate),
critic_optimizer=Adam(learning_rate=self.learning_rate),
gamma=self.discount_factor,
target_update_tau=0.01,
ou_stddev=0.3,
ou_damping=0.3,
td_errors_loss_fn=tf.keras.losses.MeanSquaredError(),
)