我知道
SB3
提供了各种技术来构建矢量化环境。我想限制自己只使用矢量化环境并从头开始实现 RL 算法。这可能吗?我的最终目标是学习如何在并行环境中使用 RL 超参数,以加快学习速度。目前,我陷入了 -
import os
import gymnasium as gym
from stable_baselines3.common.vec_env import DummyVecEnv
env = DummyVecEnv([lambda: gym.make("CartPole-v1")])
obs = env.reset()
done = False
while not done:
action = env.action_space.sample()
next_obs, reward, done, info = env.step(action)
obs = next_obs
但是我收到以下错误-
Traceback (most recent call last):
File "D:\q_learning\dummy_envs.py", line 9, in <module>
next_obs, reward, done, info = env.step(action)
File "C:\Users\thoma\anaconda3\envs\torch_2\lib\site-packages\stable_baselines3\common\vec_env\base_vec_env.py", line 197, in step
return self.step_wait()
File "C:\Users\thoma\anaconda3\envs\torch_2\lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py", line 59, in step_wait
self.actions[env_idx]
IndexError: invalid index to scalar variable.