有没有一种方法可以实现OpenAI的环境,在该环境中,操作空间在每个步骤上都会发生变化?

问题描述 投票:2回答:2

有没有一种方法可以实现OpenAI的环境,在该环境中,操作空间在每一步都会发生变化?

openai-gym
2个回答
5
投票

是(尽管某些预制的代理在这种情况下可能无法工作。

@property
def action_space(self):
    # Do some code here to calculate the available actions
    return Something

@property装饰器是,因此您可以适合健身房环境的标准格式,其中action_space是属性env.action_space,而不是方法env.action_space()


0
投票
  1. 您可以实现自己的Space后代类,并覆盖shape(),sample()和contains()方法以返回与更新的可用操作一致的值。然后,您的环境将为action_space返回自定义类的实例,您可以在每个步骤中从环境中对其进行更新。

    这可以通过您提供的其他方法来完成,例如disable_actions()和enable_actions()如下:

    import gym
    import numpy as np
    
    #You could also inherit from Discrete or Box here and just override the shape(), sample() and contains() methods
    class Dynamic(gym.Space):
    """
    x where x in available actions {0,1,3,5,...,n-1}
    Example usage:
    self.action_space = spaces.Dynamic(max_space=2)
    """
    
    def __init__(self, max_space):
        self.n = max_space
    
        #initially all actions are available
        self.available_actions = range(0, max_space)
    
    def disable_actions(self, actions):
        """ You would call this method inside your environment to remove available actions"""
        self.available_actions = [action for action in self.available_actions if action not in actions]
        return self.available_actions
    
    def enable_actions(self, actions):
        """ You would call this method inside your environment to enable actions"""
        self.available_actions = self.available_actions.append(actions)
        return self.available_actions
    
    def sample(self):
        return np.random.choice(self.available_actions)
    
    def contains(self, x):
        return x in self.available_actions
    
    @property
    def shape(self):
    """"Return the new shape here""""
        return ()
    
    def __repr__(self):
        return "Dynamic(%d)" % self.n
    
    def __eq__(self, other):
        return self.n == other.n
    
  2. 您还可以限制代理中的操作,仅允许其考虑有效的操作,但这会妨碍使用现有的通用代理。

我发现此链接对其进行了很好的解释(在此引用太长)How do I let AI know that only some actions are available during specific states in reinforcement learning?

© www.soinside.com 2019 - 2024. All rights reserved.