对于非常大的州，如何在Julia中使用DeepQLearning？

Question

我想使用https://github.com/JuliaPOMDP/DeepQLearning.jl中的DeepQLearning.jl包。为此，我们必须执行类似于

的操作

using DeepQLearning
using POMDPs
using Flux
using POMDPModels
using POMDPSimulators
using POMDPPolicies

# load MDP model from POMDPModels or define your own!
mdp = SimpleGridWorld();

# Define the Q network (see Flux.jl documentation)
# the gridworld state is represented by a 2 dimensional vector.
model = Chain(Dense(2, 32), Dense(32, length(actions(mdp))))

exploration = EpsGreedyPolicy(mdp, LinearDecaySchedule(start=1.0, stop=0.01, steps=10000/2))

solver = DeepQLearningSolver(qnetwork = model, max_steps=10000, 
                             exploration_policy = exploration,
                             learning_rate=0.005,log_freq=500,
                             recurrence=false,double_q=true, dueling=true, prioritized_replay=true)
policy = solve(solver, mdp)

sim = RolloutSimulator(max_steps=30)
r_tot = simulate(sim, mdp, policy)
println("Total discounted reward for 1 simulation: $r_tot")

在[C0行中，我们创建MDP。当我尝试创建MDP时，我遇到了很大的状态空间的问题。我的MDP中的状态是mdp = SimpleGridWorld()和{1,2,...,m}^n的向量。因此，在定义函数m时，我意识到我必须遍历非常大的所有状态，即n。

我使用包裹的方式有误吗？还是即使有成倍的数目，我们也必须迭代状态？如果是后者，那么使用深度学习的意义何在？我认为，深度Q学习可以在动作和状态空间很大的情况下提供帮助。

Answer 1

DeepQLearning不需要枚举状态空间，并且可以处理连续的空间问题。DeepQLearning.jl仅使用POMDPs.states(mdp::myMDP)。这样，您不需要实现m^n功能，而只需实现the generative interface of POMDPs.jl和states（请参阅有关如何实现生成接口的链接）。

但是，由于DQN具有离散操作性质，因此您还需要gen，它应该在操作空间上返回一个迭代器。

通过对实现进行修改，您应该可以使用求解器。

DQN中的神经网络将状态的矢量表示作为输入。如果您的状态是initialstate维向量，则神经网络输入的大小将为POMDPs.actions(mdp::YourMDP)。网络的输出大小将等于模型中的操作数。

在网格世界示例的情况下，Flux模型的输入大小为2（x，y个位置），输出大小为m。

对于非常大的州，如何在Julia中使用DeepQLearning？

问题描述投票：1回答：1

1个回答

最新问题

对于非常大的州，如何在Julia中使用DeepQLearning？

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1