NaN值张量,定制开放式ai健身房环境

问题描述 投票:0回答:1

我正在为 Boid (https://en.wikipedia.org/wiki/Boids)) 集群开发自定义环境,并使用 StableBaselines3 中的 PPO。 初始 boid 位置是从 JSON 格式文件中读取的

  1. 动作空间包含

    int
    中的 [x,y] 速度,并由添加或减去速度的映射函数使用。

  2. 观察空间存储了物体的位置。

代码: https://github.com/Hamza-101/Flocking-RL/blob/main/Simulation.py

参数(补充文件): https://github.com/Hamza-101/Flocking-RL/blob/main/Params.py

错误日志: Nan 值张量

UserWarning: You provided an OpenAI Gym environment. We strongly recommend transitioning to Gymnasium environments. Stable-Baselines3 is automatically wrapping your environments in a compatibility layer, which could potentially cause issues.   warnings.warn( Using cpu device File loaded Agent Locations:  [[0.3, -3.9], [9.3, 9.9], [1.2, -0.3], [-1.8, 6.1], [4.0, 6.9], [7.3, 5.6], [9.3, 5.7], [6.1, -4.9], [-2.8, -8.9], [-3.5, -3.0], [9.5, -3.2], [-8.2, 5.5], [-0.3, 7.5], [-2.7, 8.1], [-1.6, 0.6], [8.2, -7.2], [-3.8, 2.9], [2.8, 9.6], [-9.5, 7.1], [-4.9, 5.8]] Action space:  (20, 2) Observation space:  (40,) Logging to ./ppo_Agents_tensorboard/PPO_10 Traceback (most recent call last): 

File "D:\Env.py", line 166, in <module>     model.learn(total_timesteps=SimulationVariables["TimeSteps"])   File "D:\Softwares\Anaconda\Lib\site-packages\stable_baselines3\ppo\ppo.py", line 308, in learn     return super().learn(            ^^^^^^^^^^^^^^   File "D:\Softwares\Anaconda\Lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 259, in learn     continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "D:\Softwares\Anaconda\Lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 169, in collect_rollouts     actions, values, log_probs = self.policy(obs_tensor)                                  ^^^^^^^^^^^^^^^^^^^^^^^   File "C:\Users\Cr7th\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl     return forward_call(*args, **kwargs)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "D:\Softwares\Anaconda\Lib\site-packages\stable_baselines3\common\policies.py", line 626, in forward     distribution = self._get_action_dist_from_latent(latent_pi)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "D:\Softwares\Anaconda\Lib\site-packages\stable_baselines3\common\policies.py", line 656, in _get_action_dist_from_latent     return self.action_dist.proba_distribution(mean_actions, self.log_std)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "D:\Softwares\Anaconda\Lib\site-packages\stable_baselines3\common\distributions.py", line 164, in proba_distribution     self.distribution = Normal(mean_actions, action_std)                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "C:\Users\Cr7th\AppData\Roaming\Python\Python311\site-packages\torch\distributions\normal.py", line 56, in __init__     super().__init__(batch_shape, validate_args=validate_args)   File "C:\Users\Cr7th\AppData\Roaming\Python\Python311\site-packages\torch\distributions\distribution.py", line 62, in __init__     raise ValueError( 

ValueError: Expected parameter loc (Tensor of shape (1, 40)) of distribution Normal(loc: torch.Size([1, 40]), scale: torch.Size([1, 40])) to satisfy the constraint Real(), but found invalid values: tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,          nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])

我尝试过的:

尝试以随机速度初始化:

self.velocity = np.random.uniform(-SimulationVariables["VelocityInit"], SimulationVariables["VelocityInit"], size=2)

但运气不佳。

python reinforcement-learning openai-gym stable-baselines
1个回答
0
投票

我猜问题在于重置,https://github.com/Hamza-101/Flocking-RL/blob/main/TempSolution.py现在可以使用。

© www.soinside.com 2019 - 2024. All rights reserved.