import gym
from stable_baselines3 import A2C
env = gym.make('LunarLander-v2', render_mode="human")
env.reset()
model = A2C("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=1000)
episodes = 10
for ep in range(episodes):
obs = env.reset()
done = False
while not done:
action, _states, _episode, _determ = model.predict(obs)
obs, rewards, done, info = env.step(action)
env.render()
env.close()
我上面的代码产生以下输出:
DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`. (Deprecated NumPy 1.24)
if not isinstance(terminated, (bool, np.bool8)):
------------------------------------
| rollout/ | |
| ep_len_mean | 89.2 |
| ep_rew_mean | -227 |
| time/ | |
| fps | 43 |
| iterations | 100 |
| time_elapsed | 11 |
| total_timesteps | 500 |
| train/ | |
| entropy_loss | -1.29 |
| explained_variance | -0.0216 |
| learning_rate | 0.0007 |
| n_updates | 99 |
| policy_loss | 2.79 |
| value_loss | 12.3 |
------------------------------------
------------------------------------
| rollout/ | |
| ep_len_mean | 107 |
| ep_rew_mean | -209 |
| time/ | |
| fps | 45 |
| iterations | 200 |
| time_elapsed | 21 |
| total_timesteps | 1000 |
| train/ | |
| entropy_loss | -0.864 |
| explained_variance | -0.00161 |
| learning_rate | 0.0007 |
| n_updates | 199 |
| policy_loss | -16.6 |
| value_loss | 228 |
随后出现此错误:
------------------------------------
Traceback (most recent call last):
File "c:\Appu\Courses\Fun projects\Reinforcement Learning\c1.py", line 17, in <module>
action, _states, _episode, _determ = model.predict(obs)
^^^^^^^^^^^^^^^^^^
File "C:\Users\sarav\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\base_class.py", line 555, in predict
return self.policy.predict(observation, state, episode_start, deterministic)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\sarav\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\policies.py", line 346, in predict
observation, vectorized_env = self.obs_to_tensor(observation)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\sarav\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\policies.py", line 260, in obs_to_tensor
observation = np.array(observation)
^^^^^^^^^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
当我运行代码时,它会运行几个时间步,然后退出并出现上述错误。有什么解决办法吗?
该问题可能是新版本健身房API变化导致的。请参阅新健身房中更改 API 的解决方案
中的解决方案例如,您可以将
gym
降级到兼容的旧版本,例如gym==0.18.0
;或者,您可以通过仅查看所需的参数并将正确的参数传递给 numpy
来更改代码。例如,
obs,_ = env.reset()
您可能还需要针对
predict(), step()
的类似 API gym
调整代码。
请参阅健身房的API网站https://www.gymlibrary.dev/api/core/了解更多有关新健身房版本变化的详细信息。