面对强化学习的问题

问题描述 投票:0回答:1
import gym
from stable_baselines3 import A2C

env = gym.make('LunarLander-v2', render_mode="human")  
env.reset()

model = A2C("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=1000)

episodes = 10

for ep in range(episodes):
    obs = env.reset()
    done = False
    while not done:
        action, _states, _episode, _determ = model.predict(obs)
        obs, rewards, done, info = env.step(action)
        env.render()

env.close()     

我上面的代码产生以下输出:

DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
  if not isinstance(terminated, (bool, np.bool8)):
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 89.2     |
|    ep_rew_mean        | -227     |
| time/                 |          |
|    fps                | 43       |
|    iterations         | 100      |
|    time_elapsed       | 11       |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -1.29    |
|    explained_variance | -0.0216  |
|    learning_rate      | 0.0007   |
|    n_updates          | 99       |
|    policy_loss        | 2.79     |
|    value_loss         | 12.3     |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 107      |
|    ep_rew_mean        | -209     |
| time/                 |          |
|    fps                | 45       |
|    iterations         | 200      |
|    time_elapsed       | 21       |
|    total_timesteps    | 1000     |
| train/                |          |
|    entropy_loss       | -0.864   |
|    explained_variance | -0.00161 |
|    learning_rate      | 0.0007   |
|    n_updates          | 199      |
|    policy_loss        | -16.6    |
|    value_loss         | 228      |

随后出现此错误:

------------------------------------
Traceback (most recent call last):
  File "c:\Appu\Courses\Fun projects\Reinforcement Learning\c1.py", line 17, in <module>
    action, _states, _episode, _determ = model.predict(obs)
                                         ^^^^^^^^^^^^^^^^^^
  File "C:\Users\sarav\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\base_class.py", line 555, in predict
    return self.policy.predict(observation, state, episode_start, deterministic)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\sarav\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\policies.py", line 346, in predict
    observation, vectorized_env = self.obs_to_tensor(observation)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\sarav\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\policies.py", line 260, in obs_to_tensor
    observation = np.array(observation)
                  ^^^^^^^^^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

当我运行代码时,它会运行几个时间步,然后退出并出现上述错误。有什么解决办法吗?

python machine-learning reinforcement-learning openai-gym
1个回答
0
投票

该问题可能是新版本健身房API变化导致的。请参阅新健身房中更改 API 的解决方案

中的解决方案

例如,您可以将

gym
降级到兼容的旧版本,例如
gym==0.18.0
;或者,您可以通过仅查看所需的参数并将正确的参数传递给
numpy
来更改代码。例如,

obs,_ = env.reset()

您可能还需要针对

predict(), step()
的类似 API
gym
调整代码。

请参阅健身房的API网站https://www.gymlibrary.dev/api/core/了解更多有关新健身房版本变化的详细信息。

© www.soinside.com 2019 - 2024. All rights reserved.