RLlib 中多代理环境观测空间的问题

问题描述 投票:0回答:1

class CapacityEnv(MultiAgentEnv): def __init__(self): self.action_space = Discrete(2) # 0 not transmit, 1 transmit self.observation_space = Dict({ "1": Box(low=0, high=101, dtype=int), # Maximum capacity is 100 for each agent "2": Box(low=0, high=101, dtype=int) }) self.node_capacity = {"1": 100, "2": 100} def step(self, action_dict): node_choice_1 = action_dict["1"] node_choice_2 = action_dict["2"] rewards = {"1": 0, "2": 0} if node_choice_1 == node_choice_2: rewards = {"1": -10, "2": -10} if node_choice_1 == 0 and node_choice_2 == 1: if self.node_capacity["2"] >= self.node_capacity["1"]: rewards = {"1": 10, "2": 10} else: rewards = {1: -10, 2: -10} self.node_capacity["1"] = self.node_capacity["1"] self.node_capacity["2"] = self.node_capacity["2"] - 5 elif node_choice_1 == 1 and node_choice_2 == 0: if self.node_capacity["1"] >= self.node_capacity["2"]: rewards = {"1": 10, "2": 10} else: rewards = {"1": -10, "2": -10} self.node_capacity["1"] = self.node_capacity["1"] - 5 self.node_capacity["2"] = self.node_capacity["2"] print(self.node_capacity) observations = self.node_capacity if self.node_capacity["1"] == 0 or self.node_capacity["2"] == 0: done = True else: done = False return observations, rewards, done, False, {} def reset(self, *, seed=None, options=None): self.node_capacity = {"1": 100, "2": 100} print(self.node_capacity) observations = self.node_capacity return observations, {}
但是,当我使用 RLlib 训练我的代理时:

from ray.tune.logger.logger import pretty_print config = DQNConfig().environment(CapacityEnv).training(gamma=0.9, lr=0.001, train_batch_size=512) agent = config.build() for i in range(2): result = agent.train() print(pretty_print(result))

ValueError Traceback (most recent call last) <ipython-input-69-7e2ec9487891> in <cell line: 2>() 1 from ray.tune.logger.logger import pretty_print 2 for i in range(2): ----> 3 result = agent.train() 4 print(pretty_print(result)) 20 frames /usr/local/lib/python3.10/dist-packages/tree/__init__.py in assert_same_structure(a, b, check_types) 286 str1 = str(map_structure(lambda _: _DOT, a)) 287 str2 = str(map_structure(lambda _: _DOT, b)) --> 288 raise type(e)("%s\n" 289 "Entire first structure:\n%s\n" 290 "Entire second structure:\n%s" ValueError: The two structures don't have the same nested structure. First structure: type=int str=100 Second structure: type=OrderedDict str=OrderedDict([('1', 55), ('2', 94)]) More specifically: Substructure "type=OrderedDict str=OrderedDict([('1', 55), ('2', 94)])" is a sequence, while substructure "type=int str=100" is not Entire first structure: . Entire second structure: OrderedDict([('1', .), ('2', .)])
我认为这个问题可能与观察空间有关,我正在尝试解决它,但我无法修复它,特别是在 RLlib 和 MultiAgentEnv 的上下文中。任何有关如何解决此问题的指导或见解将不胜感激。谢谢!

reinforcement-learning openai-gym ray multi-agent rllib
在 env 实现中需要修复一些问题。

  1. RLLib 期望观察空间是“针对每个智能体”。您可以使用简单的观察空间并设置代理 ID,如下所示:

    self.observation_space = Box(low=0, high=101, dtype=int) self._agent_ids = ["1","2"]
  2. 如果您的观察空间是一个 Box,RLLib 期望每个观察结果都是一个 numpy ndarray。因此,您应该构建一个如下所示的字典:{"1":array[99],"2":array[2]}。 具体来说,在调用重置并且观察结果与空间不匹配后抛出此错误。 您可以通过对观察空间进行采样来了解观察值应该是什么样子:

    self.node_capacity["1"] = self.observation_space.sample() self.node_capacity["2"] = self.observation_space.sample() observations = self.node_capacity
  3. 您的完成和截断也应该是包含每个代理的布尔值和“__all__”的字典。 RLLib 会抱怨,因为它需要一个多代理字典。

我建议将观察结果和奖励保留为 ints/numpy 数组,并在从 reset() 和 step() 返回之前将它们转换为字典。


© www.soinside.com 2019 - 2024. All rights reserved.