我有两个与 LSTM 模型的输入要求相关的问题。我的 LSTM 需要 3D 输入作为重放缓冲区(重放缓冲区本身是双端队列)作为某些组件的元组提供的张量。 LSTM 要求每个组件是单个值而不是序列。
state_dim = 21; batch_size = 32
问题:
批量采样返回的 NumPy 数组是一维的(1D),而要求的是 3D。使用 np.reshape 或 np.expand 或 np.asarray 都不起作用,因为它会返回错误,例如 ValueError: cannot reshape array of size 32 into shape (32,1,21)
当使用数组广播作为解决方法(那只是测试,根本不想在我的代码中使用广播)时,将数组转换为张量时会出现另一个错误:ValueError: setting an array element with a sequence.
代码结构:
有一个函数将状态作为 21 个特征的列表返回,我们称它为def get_state():
然后它被这部分代码使用:
def sample_batch():
global batch_size
batch_indices = np.random.randint(len(replay_buffer), size=batch_size)
batch = [replay_buffer[index] for index in batch_indices]
states = np.array([np.array(item[0], dtype=np.float32) for item in batch])
actions = np.array([item[1] for item in batch])
rewards = np.array([item[2] for item in batch])
next_states = np.array([np.array(item[3], dtype=np.float32) for item in batch])
done_flags = np.array([item[4] for item in batch])
# Ensure states and next_states are 2-dimensional arrays - this is the workaround that was mentioned but it should not be in the final code
if states.ndim == 1:
states = states[:, np.newaxis]
if next_states.ndim == 1:
next_states = next_states[:, np.newaxis]
n_timesteps = 1 # Specify the number of timesteps
# Add a time_steps dimension to states and next_states using broadcasting - this is the workaround that was mentioned but it should not be in the final code
states = states[:, np.newaxis, :] * np.ones((1, n_timesteps, 1), dtype=np.float32)
next_states = next_states[:, np.newaxis, :] * np.ones((1, n_timesteps, 1), dtype=np.float32)
# Convert to tensor
states = tf.convert_to_tensor(states, dtype=tf.float32)
next_states = tf.convert_to_tensor(next_states, dtype=tf.float32)
return states, actions, rewards, next_states, done_flags
# Update the network based on target model Q-values
def update_network():
global batch_size
states, actions, rewards, next_states, done_flags = sample_batch()
Q_values = model.predict(states)
Q_values_next = target_model.predict(next_states)
for i in range(batch_size):
if done_flags[i]:
Q_values[i][actions[i]] = rewards[i]
else:
Q_values[i][actions[i]] = rewards[i] + gamma * np.max(Q_values_next[i])
model.train_on_batch(states, Q_values)
# Define max steps per episode
max_steps_per_episode = 1000
# Set the number of episodes over which to calculate the average reward
average_over_episodes = 5
# Initialize a list to store the rewards for each episode
episode_rewards = []
# Keep track of the best average reward
best_avg_reward = float('-inf')
# Definition of Q-learning + definition of state as state = get_state()
for episode in range(num_episodes):
state = get_state()
episode_reward = 0
# Decay epsilon (episode - 1 is to start from initial value and then decay in the next episode)
epsilon = initial_epsilon * (decay_rate ** (episode - 1))
step = 0
done = False
while not done:
# Choose action using epsilon-greedy policy
if np.random.rand() < epsilon:
action = np.random.randint(num_actions)
else:
Q_values_single = model.predict(np.array([state]))
action = np.argmax(Q_values_single)
# Take action and get reward
take_action(action)
reward = get_reward()
next_state = get_state()
# Add experience to replay buffer
replay_buffer.append((state, action, reward, next_state, done))
顺便说一句,那里有两种模型:在线模型和目标模型:
# Choose optimizer:
optimizer = keras.optimizers.RMSprop(learning_rate=alpha)
# Set up target model -- influenced by Deep Q-learning paper by Minh et al. (2015)
target_model = keras.Sequential(
[
layers.LSTM(64, input_shape=(None, state_dim)),
layers.Dense(num_actions),
]
)
target_model.compile(loss="mse", optimizer=optimizer)
# Define a function to update the target network's weights
def update_target_network():
target_model.set_weights(model.get_weights())
# Set up online model and load weights
model = keras.Sequential(
[
layers.LSTM(64, input_shape=(None, state_dim)),
layers.Dense(num_actions),
]
)
model.compile(loss="mse", optimizer=optimizer)
我确实尝试了各种方法来解决这些错误,如上所述,np.reshape、np.expand、np.asarray 或使用数组广播(确实有效但揭示了重播缓冲区组件顺序性的另一个问题)或 dtype=np.float32 .
如果有人对如何准备 LSTM 输入有一些想法,以便状态特征数据都是 3D 数组并且每个状态数组元素都是单个值(即使由 21 个特征组成的序列),我将不胜感激。
错误似乎是由于 LSTM 模型的输入数据的形状引起的。正如您所提到的,LSTM 需要 3D 输入作为张量,我们需要确保输入数据在输入到 LSTM 之前已正确整形。
根据您提供的代码,批量采样后的states和next_states数据的形状似乎是(batch_size,21)。然而,LSTM 期望输入形状为 (batch_size, n_timesteps, state_dim),其中 n_timesteps 是序列中的时间步数,state_dim 是状态中的特征数。因此,我们需要重塑 states 和 next_states 数据以匹配此形状。
为此,我们可以使用 NumPy 的 reshape 函数。例如,要将状态重塑为 (batch_size, 1, state_dim) 的形状,我们可以使用以下代码:
states = np.reshape(states, (batch_size, 1, state_dim))
类似地,我们可以重塑 next_states 以具有相同的形状。请注意,我们将新形状的第二个维度设置为 1,因为我们一次只向 LSTM 提供一个时间步长。
一旦我们正确地重塑了状态和 next_states 数据,我们就可以像以前一样使用 tf.convert_to_tensor 函数将它们转换为张量。
这里是一个更新的 sample_batch 函数,它应该可以工作:
def sample_batch():
global batch_size
batch_indices = np.random.randint(len(replay_buffer), size=batch_size)
batch = [replay_buffer[index] for index in batch_indices]
states = np.array([np.array(item[0], dtype=np.float32) for item in batch])
actions = np.array([item[1] for item in batch])
rewards = np.array([item[2] for item in batch])
next_states = np.array([np.array(item[3], dtype=np.float32) for item in batch])
done_flags = np.array([item[4] for item in batch])
n_timesteps = 1 # Specify the number of timesteps
# Reshape states and next_states to have shape (batch_size, n_timesteps, state_dim)
states = np.reshape(states, (batch_size, 1, state_dim))
next_states = np.reshape(next_states, (batch_size, 1, state_dim))
# Convert to tensor
states = tf.convert_to_tensor(states, dtype=tf.float32)
next_states = tf.convert_to_tensor(next_states, dtype=tf.float32)
return states, actions, rewards, next_states, done_flags
希望这能帮助您解决 LSTM 模型的输入形状问题!