我正在尝试通过图书馆学习pytorch和强化学习。我不太了解批量大小并不断收到此错误消息“matmul 的两个参数至少需要 1D,但它们是 0D 和 2D”
到目前为止,这是我的代码
import numpy as np
import gym
import torch as T
import torch.multiprocessing as mp
import torch.nn as nn
import torch.nn.functional as F
from torch.distributions import Categorical
import torch.optim as optim
class network(nn.Module):
def __init__(self, lr, input_dims, h1_dims, h2_dims, n_actions):
super(network, self).__init__()
self.lr = lr
self.input_dims = input_dims
self.h1_dims = h1_dims
self.h2_dims = h2_dims
self.n_actions = n_actions
self.h1 = nn.Linear(self.input_dims, self.h1_dims)
self.h2 = nn.Linear(self.h1_dims, self.h2_dims)
self.p = nn.Linear(self.h2_dims, self.n_actions)
self.v = nn.Linear(self.h2_dims, 1)
self.optimizer = optim.Adam(self.parameters(), lr=self.lr)
self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')
self.to(self.device)
def forward(self, observation):
state = T.tensor(observation[0], dtype=T.float).to(self.device)
x = F.relu(self.h1(state)) ##### here is the error ######
x = F.relu(self.h2(x))
v = self.v(x)
p = F.softmax(self.p(x))
return v, p
def choose_action(testing, observation):
value, policy = testing.forward(observation[0])
action_probs = T.distributions.Categorical(probs=policy)
action = action_probs.sample()
return action
env = gym.make('CartPole-v1')
action_spaces = env.action_space.n
network_instance = network(lr=0.005, input_dims=4, h1_dims=128,
h2_dims=128, n_actions=action_spaces)
for i in range(2):
obs = env.reset()
testing = choose_action(network_instance, obs)
print(testing)
如何解决这个问题??谢谢