如何修复 R 中 Q-Learning 算法的错误?

问题描述 投票:0回答:1

我正在尝试在 R 中实现 Q-Learning 算法:


# Define the map
map <- matrix(c(0, 1, 1, 0, 0, 0, 0, 1), nrow = 2, ncol = 4, byrow = TRUE)
# State labels
rownames(map) <- c("Start", "End")
# Action labels
colnames(map) <- c("Up", "Down", "Left", "Right")
# Rewards for each state-action pair
rewards <- matrix(c(-1, -1, -1, -1, -1, -1, -1, 10), nrow = 2, ncol = 4, byrow = TRUE)

# Q-Learning Algorithm
q_learning <- function(P, R, gamma = 0.9, alpha = 0.1, epsilon = 0.1, max_iter = 1000) {
  # Initialize the Q-value function
  Q <- matrix(rep(0, nrow(P) * ncol(P)), nrow = nrow(P), ncol = ncol(P))
  # Initialize the state
  state <- sample(1:nrow(P), 1)
  # Iterate until convergence or maximum iterations reached
  for (i in 1:max_iter) {
    # Choose an action using epsilon-greedy policy
    if (runif(1) < epsilon) {
      action <- sample(1:ncol(P), 1)
    } else {
      action <- which.max(Q[state, ])
    }
    # Observe the next state and reward
    prob <- P[state, action]
    next_state <- sample(1:nrow(P), 1, prob = prob)
    reward <- R[state, action]
    # Update the Q-value function
    Q[state, action] <- Q[state, action] + alpha * (reward + gamma * max(Q[next_state, ]) - Q[state, action])
    # Update the state
    state <- next_state
  }
  # Derive the optimal policy (argmax in R using the which.max)
  policy <- apply(Q, 1, which.max)
  # Return the Q-value function and policy
  return(list(Q = Q, policy = policy))
}


# Run the Q-Learning Algorithm on the map
q_learning(P = map, R = rewards, gamma = 0.9, alpha = 0.1, epsilon = 0.1, max_iter = 1000)


我收到样本函数错误,概率数不正确。

Error in sample.int(length(x), size, replace, prob) :
incorrect number of probabilities

我该如何解决?

r reinforcement-learning sample
1个回答
0
投票

我不熟悉这个算法,但是,通过查看代码猜测,你可以试试

prob <- P[ , action] 

这将创建一个长度为

nrow(P)
的向量。您将需要自己完成逻辑!

© www.soinside.com 2019 - 2024. All rights reserved.