Matplotlib-如何绘制训练中每个时期的进度?

问题描述 投票:0回答:1

我正在尝试构建一个能打棍子的深层Q学习代理。通过强化学习,它学会了通过移动手推车来平衡斗杆。

我的模型有效,但是我不知道如何绘制训练进度。我不知道该如何绘制与这张图片相似的游戏和得分:

https://github.com/JulesVerny/PongReinforcementLearning/blob/master/ScoreGrowth.png

我一直在使用matplotlib,但似乎无法弄清楚。

我已经能够显示一个图,但是它只显示为空白。不太确定现在该怎么办。

这是我的代码:

import random
import gym
import numpy as np
from collections import deque
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from matplotlib import pyplot as plt

EPISODES = 10

class DQNAgent:
    def __init__(self, state_size, action_size):
        self.state_size = state_size
        self.action_size = action_size
        self.memory = deque(maxlen=2000)
        self.gamma = 0.95    # discount rate
        self.epsilon = 1.0  # exploration rate
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.learning_rate = 0.001
        self.model = self._build_model()

    def _build_model(self):
        # Neural Net for Deep-Q learning Model
        model = Sequential()
        model.add(Dense(24, input_dim=self.state_size, activation='relu'))
        model.add(Dense(24, activation='relu'))
        model.add(Dense(self.action_size, activation='linear'))
        model.compile(loss='mse',
                      optimizer=Adam(lr=self.learning_rate))
        return model

    def remember(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))

    def act(self, state):
        if np.random.rand() <= self.epsilon:
            return random.randrange(self.action_size)
        act_values = self.model.predict(state)
        return np.argmax(act_values[0])  # returns action

    def replay(self, batch_size):
        minibatch = random.sample(self.memory, batch_size)
        for state, action, reward, next_state, done in minibatch:
            target = reward
            if not done:
                target = (reward + self.gamma *
                          np.amax(self.model.predict(next_state)[0]))
            target_f = self.model.predict(state)
            target_f[0][action] = target
            self.model.fit(state, target_f, epochs=1, verbose=0)
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

    def load(self, name):
        self.model.load_weights(name)

    def save(self, name):
        self.model.save_weights(name)

if __name__ == "__main__":
    env = gym.make('CartPole-v1')
    state_size = env.observation_space.shape[0]
    action_size = env.action_space.n
    agent = DQNAgent(state_size, action_size)
    # agent.load("/home/jack/Desktop/cartpole-dqn.h5")
    done = False
    batch_size = 32

    for e in range(EPISODES):
        state = env.reset()
        state = np.reshape(state, [1, state_size])
        for time in range(500):
            # env.render()
            action = agent.act(state)
            next_state, reward, done, _ = env.step(action)
            reward = reward if not done else -10
            next_state = np.reshape(next_state, [1, state_size])
            agent.remember(state, action, reward, next_state, done)
            state = next_state
            if done:
                print("episode: {}/{}, score: {}, e: {:.2}"
                      .format(e, EPISODES, time, agent.epsilon))
                break
            if len(agent.memory) > batch_size:
                agent.replay(batch_size)
        if e % 10 == 0:
            agent.save("/home/jack/Desktop/cartpole-dqn.h5")

有什么想法吗?

python-3.x matplotlib machine-learning keras openai-gym
1个回答
0
投票

一种简单的方法是在定义批次大小后初始化奖励列表,例如:

rewardList = []

然后,为每个情节初始化奖励累加器。重置环境后定义它:

accu_reward = 0

然后,在时间循环的最后一行,放置:

accu_reward += reward
if time == 499:
    rewardList.append(accu_reward)

然后在代码的最底部:

plt.plot(rewardList)
plt.show()

这应该使您在整个训练过程中都得到奖励发展。

[这里,每集结束后,您都将添加到奖励列表中。您还可以在每个步骤之后追加到rewardList,但是它的空间复杂度会高很多,并且会包含更多的差异。

© www.soinside.com 2019 - 2024. All rights reserved.