如何设置rlib多代理PPO?

问题描述 投票:0回答:1

我设置了一个非常简单的多代理环境,用于ray.rlib,我试图运行一个简单的PPO与随机策略训练场景的基线测试,如下所示。

register_env("my_env", lambda _: MyEnv(num_agents=2))
mock = MyEnv()
obs_space = mock.observation_space
act_space = mock.action_space
tune.run( 
    "PPO",
    stop={"training_iteration": args.num_iters},
    config={
        "env": "my_env",
        "num_gpus":1,
        "multiagent": {
            "policies": {
                "ppo_policy": (None, obs_space, act_space, {}),
                "random": (RandomPolicy, obs_space, act_space, {}),
            },  
            "policy_mapping_fn": (
                lambda agent_id: {1:"appo_policy", 2:"random"}[agent_id]),
        },
    },
)

当测试时,我收到一个错误,如下所示。

Traceback (most recent call last):
  File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 381, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/worker.py", line 1513, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::PPO.__init__() (pid=18163, ip=192.168.1.25)
  File "python/ray/_raylet.pyx", line 414, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 449, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 450, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 452, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task.function_executor
  File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 90, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 455, in __init__
    super().__init__(config, logger_creator)
  File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/tune/trainable.py", line 174, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 596, in _setup
    self._init(self.config, self.env_creator)
  File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 129, in _init
    self.optimizer = make_policy_optimizer(self.workers, config)
  File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/ppo/ppo.py", line 95, in choose_policy_optimizer
    shuffle_sequences=config["shuffle_sequences"])
  File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/optimizers/multi_gpu_optimizer.py", line 99, in __init__
    "Only TF graph policies are supported with multi-GPU. "
ValueError: Only TF graph policies are supported with multi-GPU. Try setting `simple_optimizer=True` instead.

我尝试设置 simple_optimizer:True 的配置中,但这给了我一个 NotImplementedErrorset_weights rllib策略类的函数... ...

我把 "PPO" 的配置中,为 "PG" 而且运行得很好,所以这不太可能与我如何定义环境有关。有什么办法可以解决这个问题吗?

reinforcement-learning ray multi-agent rllib
1个回答
0
投票

请看一下 这个 问题。你应该定义。

def get_weights(self):
        return None
© www.soinside.com 2019 - 2024. All rights reserved.