Openai环境，每个步骤后都有不断变化的动作空间

Question

有没有办法让我实现openai环境，其中每个步骤的动作空间都会发生变化？

Answer 1

是的（虽然在这种情况下，某些预制代理可能不起作用）。

@property
def action_space(self):
    # Do some code here to calculate the available actions
    return Something

@property装饰器是这样你可以适合健身房环境的标准格式，其中action_space是属性env.action_space而不是方法env.action_space()。

Answer 2

您可以实现自己的Space后代类并覆盖shape（），sample（）和contains（）方法，以返回与更新的可用操作一致的值。然后，您的环境会返回action_space的自定义类的实例，您可以在每个步骤的环境中更新该实例。这可以通过您提供的其他方法来完成，例如disable_actions（）和enable_actions（）如下： import gym import numpy as np #You could also inherit from Discrete or Box here and just override the shape(), sample() and contains() methods class Dynamic(gym.Space): """ x where x in available actions {0,1,3,5,...,n-1} Example usage: self.action_space = spaces.Dynamic(max_space=2) """ def __init__(self, max_space): self.n = max_space #initially all actions are available self.available_actions = range(0, max_space) def disable_actions(self, actions): """ You would call this method inside your environment to remove available actions""" self.available_actions = [action for action in self.available_actions if action not in actions] return self.available_actions def enable_actions(self, actions): """ You would call this method inside your environment to enable actions""" self.available_actions = self.available_actions.append(actions) return self.available_actions def sample(self): return np.random.choice(self.available_actions) def contains(self, x): return x in self.available_actions @property def shape(self): """"Return the new shape here"""" return () def __repr__(self): return "Dynamic(%d)" % self.n def __eq__(self, other): return self.n == other.n
您还可以限制代理中的操作，并仅允许其考虑有效操作，但这会妨碍使用现有的通用代理。

我发现这个链接解释得很好（这里引用的时间太长）How do I let AI know that only some actions are available during specific states in reinforcement learning?

Openai环境，每个步骤后都有不断变化的动作空间

问题描述投票：1回答：2

2个回答

最新问题

Openai环境，每个步骤后都有不断变化的动作空间

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2