LSTM输入如何解决维数和顺序问题?

问题描述 投票:0回答:1

我有两个与 LSTM 模型的输入要求相关的问题。我的 LSTM 需要 3D 输入作为重放缓冲区(重放缓冲区本身是双端队列)作为某些组件的元组提供的张量。 LSTM 要求每个组件是单个值而不是序列。

state_dim = 21; batch_size = 32

问题:

  1. 批量采样返回的 NumPy 数组是一维的(1D),而要求的是 3D。使用 np.reshape 或 np.expand 或 np.asarray 都不起作用,因为它会返回错误,例如 ValueError: cannot reshape array of size 32 into shape (32,1,21)

  2. 当使用数组广播作为解决方法(那只是测试,根本不想在我的代码中使用广播)时,将数组转换为张量时会出现另一个错误:ValueError: setting an array element with a sequence.

代码结构:

有一个函数将状态作为 21 个特征的列表返回,我们称它为def get_state():

然后它被这部分代码使用:

def sample_batch():
    global batch_size
    batch_indices = np.random.randint(len(replay_buffer), size=batch_size)
    batch = [replay_buffer[index] for index in batch_indices]

    states = np.array([np.array(item[0], dtype=np.float32) for item in batch])
    actions = np.array([item[1] for item in batch])
    rewards = np.array([item[2] for item in batch])
    next_states = np.array([np.array(item[3], dtype=np.float32) for item in batch])
    done_flags = np.array([item[4] for item in batch])

    # Ensure states and next_states are 2-dimensional arrays - this is the workaround that was mentioned but it should not be in the final code
    if states.ndim == 1:
        states = states[:, np.newaxis]
    if next_states.ndim == 1:
        next_states = next_states[:, np.newaxis]

    n_timesteps = 1  # Specify the number of timesteps

    # Add a time_steps dimension to states and next_states using broadcasting - this is the workaround that was mentioned but it should not be in the final code
    states = states[:, np.newaxis, :] * np.ones((1, n_timesteps, 1), dtype=np.float32)
    next_states = next_states[:, np.newaxis, :] * np.ones((1, n_timesteps, 1), dtype=np.float32)

    # Convert to tensor
    states = tf.convert_to_tensor(states, dtype=tf.float32)
    next_states = tf.convert_to_tensor(next_states, dtype=tf.float32)

    return states, actions, rewards, next_states, done_flags

# Update the network based on target model Q-values
def update_network():
    global batch_size
    states, actions, rewards, next_states, done_flags = sample_batch()

    Q_values = model.predict(states)
    Q_values_next = target_model.predict(next_states)

    for i in range(batch_size):
        if done_flags[i]:
            Q_values[i][actions[i]] = rewards[i]
        else:
            Q_values[i][actions[i]] = rewards[i] + gamma * np.max(Q_values_next[i])

    model.train_on_batch(states, Q_values)


# Define max steps per episode
max_steps_per_episode = 1000

# Set the number of episodes over which to calculate the average reward
average_over_episodes = 5

# Initialize a list to store the rewards for each episode
episode_rewards = []

# Keep track of the best average reward
best_avg_reward = float('-inf')

# Definition of Q-learning + definition of state as state = get_state()
for episode in range(num_episodes):
    state = get_state()
    episode_reward = 0
    # Decay epsilon (episode - 1 is to start from initial value and then decay in the next episode)
    epsilon = initial_epsilon * (decay_rate ** (episode - 1))
    step = 0
    done = False

    while not done:
        # Choose action using epsilon-greedy policy
        if np.random.rand() < epsilon:
            action = np.random.randint(num_actions)
        else:
            Q_values_single = model.predict(np.array([state]))
            action = np.argmax(Q_values_single)

        # Take action and get reward
        take_action(action)
        reward = get_reward()
        next_state = get_state()

        # Add experience to replay buffer
        replay_buffer.append((state, action, reward, next_state, done))

顺便说一句,那里有两种模型:在线模型和目标模型:

# Choose optimizer:
optimizer = keras.optimizers.RMSprop(learning_rate=alpha)

# Set up target model -- influenced by Deep Q-learning paper by Minh et al. (2015)
target_model = keras.Sequential(
    [
        layers.LSTM(64, input_shape=(None, state_dim)),
        layers.Dense(num_actions),
    ]
)
target_model.compile(loss="mse", optimizer=optimizer)


# Define a function to update the target network's weights
def update_target_network():
    target_model.set_weights(model.get_weights())


# Set up online model and load weights
model = keras.Sequential(
    [
        layers.LSTM(64, input_shape=(None, state_dim)),
        layers.Dense(num_actions),
    ]
)

model.compile(loss="mse", optimizer=optimizer)

我确实尝试了各种方法来解决这些错误,如上所述,np.reshape、np.expand、np.asarray 或使用数组广播(确实有效但揭示了重播缓冲区组件顺序性的另一个问题)或 dtype=np.float32 .

如果有人对如何准备 LSTM 输入有一些想法,以便状态特征数据都是 3D 数组并且每个状态数组元素都是单个值(即使由 21 个特征组成的序列),我将不胜感激。

python numpy artificial-intelligence lstm
1个回答
0
投票

错误似乎是由于 LSTM 模型的输入数据的形状引起的。正如您所提到的,LSTM 需要 3D 输入作为张量,我们需要确保输入数据在输入到 LSTM 之前已正确整形。

根据您提供的代码,批量采样后的states和next_states数据的形状似乎是(batch_size,21)。然而,LSTM 期望输入形状为 (batch_size, n_timesteps, state_dim),其中 n_timesteps 是序列中的时间步数,state_dim 是状态中的特征数。因此,我们需要重塑 states 和 next_states 数据以匹配此形状。

为此,我们可以使用 NumPy 的 reshape 函数。例如,要将状态重塑为 (batch_size, 1, state_dim) 的形状,我们可以使用以下代码:

states = np.reshape(states, (batch_size, 1, state_dim))

类似地,我们可以重塑 next_states 以具有相同的形状。请注意,我们将新形状的第二个维度设置为 1,因为我们一次只向 LSTM 提供一个时间步长。

一旦我们正确地重塑了状态和 next_states 数据,我们就可以像以前一样使用 tf.convert_to_tensor 函数将它们转换为张量。

这里是一个更新的 sample_batch 函数,它应该可以工作:

def sample_batch():
    global batch_size
    batch_indices = np.random.randint(len(replay_buffer), size=batch_size)
    batch = [replay_buffer[index] for index in batch_indices]

    states = np.array([np.array(item[0], dtype=np.float32) for item in batch])
    actions = np.array([item[1] for item in batch])
    rewards = np.array([item[2] for item in batch])
    next_states = np.array([np.array(item[3], dtype=np.float32) for item in batch])
    done_flags = np.array([item[4] for item in batch])

    n_timesteps = 1  # Specify the number of timesteps

    # Reshape states and next_states to have shape (batch_size, n_timesteps, state_dim)
    states = np.reshape(states, (batch_size, 1, state_dim))
    next_states = np.reshape(next_states, (batch_size, 1, state_dim))

    # Convert to tensor
    states = tf.convert_to_tensor(states, dtype=tf.float32)
    next_states = tf.convert_to_tensor(next_states, dtype=tf.float32)

    return states, actions, rewards, next_states, done_flags

希望这能帮助您解决 LSTM 模型的输入形状问题!

© www.soinside.com 2019 - 2024. All rights reserved.