有没有办法让我实现openai环境,其中每个步骤的动作空间都会发生变化?
是的(虽然在这种情况下,某些预制代理可能不起作用)。
@property
def action_space(self):
# Do some code here to calculate the available actions
return Something
@property
装饰器是这样你可以适合健身房环境的标准格式,其中action_space是属性env.action_space
而不是方法env.action_space()
。
import gym
import numpy as np
#You could also inherit from Discrete or Box here and just override the shape(), sample() and contains() methods
class Dynamic(gym.Space):
"""
x where x in available actions {0,1,3,5,...,n-1}
Example usage:
self.action_space = spaces.Dynamic(max_space=2)
"""
def __init__(self, max_space):
self.n = max_space
#initially all actions are available
self.available_actions = range(0, max_space)
def disable_actions(self, actions):
""" You would call this method inside your environment to remove available actions"""
self.available_actions = [action for action in self.available_actions if action not in actions]
return self.available_actions
def enable_actions(self, actions):
""" You would call this method inside your environment to enable actions"""
self.available_actions = self.available_actions.append(actions)
return self.available_actions
def sample(self):
return np.random.choice(self.available_actions)
def contains(self, x):
return x in self.available_actions
@property
def shape(self):
""""Return the new shape here""""
return ()
def __repr__(self):
return "Dynamic(%d)" % self.n
def __eq__(self, other):
return self.n == other.n
我发现这个链接解释得很好(这里引用的时间太长)How do I let AI know that only some actions are available during specific states in reinforcement learning?