pad_sequences 更改 TensorFlow 中的整个数组 - Python

Question

我正在

tensorflow

练习 LSTM 网络。我目前正在学习用于不同长度输入的掩码和填充。但是，当我使用

pad_sequences

方法时，我观察到一种奇怪的行为。

import numpy as np
import tensorflow as tf

max_length = 8
X = []

for i in range(5):
    length = np.random.randint(1, max_length+1)
    data = np.random.randn(length, 4)
    X.append(data)

print(X)

[array([[ 0.23830355,  0.32776379,  0.19888588,  0.27975603],
        [-0.84285787, -0.76969476,  0.01841278, -0.88942005],
        [-1.51102046, -0.18195023, -1.32969908,  0.19397443]]),
 array([[-0.10567699, -0.79576066,  0.55816155, -0.70074442],
        [-0.0386933 ,  0.54722971, -1.71065981,  1.00276863],
        [ 1.82485917, -1.19912133, -1.91067831,  0.37120413]]),
 array([[ 0.03045082,  0.41638681, -1.49605253, -0.41086347],
        [ 0.65929396, -0.09148023, -0.22942781, -0.76795439],
        [ 0.56964325,  0.7318355 ,  1.41732107,  0.38632864],
        [ 0.78369032,  1.41461136, -1.32514831,  1.27382442],
        [-1.4822751 ,  0.44608809, -0.01882849,  0.78095785]]),
 array([[ 1.59961346, -0.74595856, -0.91752237, -1.81289865],
        [ 0.13899283, -0.93514456, -0.68329374, -0.91662576],
        [ 1.09513416,  0.83803103,  0.63074595, -1.88594795]]),
 array([[ 1.64358502, -2.28208926, -0.26371596, -0.59044336],
        [ 1.52187054,  1.42308418,  0.0275608 , -0.09422734]])]

首先，我创建了一个不同长度的随机数据集。现在，我继续使用

pad_sequences

方法使每个输入向量的长度相同。

mask_val = -1
X_padded = tf.keras.preprocessing.sequence.pad_sequences(X, maxlen=max_length, padding='post', truncating='post', value=mask_val)
print(X_padded)
array([[[ 0,  0,  0,  0],
        [ 0,  0,  0,  0],
        [-1,  0, -1,  0],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1]],

       [[ 0,  0,  0,  0],
        [ 0,  0, -1,  1],
        [ 1, -1, -1,  0],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1]],

       [[ 0,  0, -1,  0],
        [ 0,  0,  0,  0],
        [ 0,  0,  1,  0],
        [ 0,  1, -1,  1],
        [-1,  0,  0,  0],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1]],

       [[ 1,  0,  0, -1],
        [ 0,  0,  0,  0],
        [ 1,  0,  0, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1]],

       [[ 1, -2,  0,  0],
        [ 1,  1,  0,  0],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1],
        [-1, -1, -1, -1]]])

如您所见，对于每个缺失数据，

tensorflow

根据需要使用

-1

。但是，所有其他条目都搞砸了。有人可以让我知道我在这里做错了什么以及如何解决它。

Answer 1

只需使用

dtype='float32'

作为参数，否则默认情况下 func 将四舍五入为近集整数

X_padded = tf.keras.preprocessing.sequence.pad_sequences(X, padding='post', truncating='post', value=mask_val, dtype='float32')

pad_sequences 更改 TensorFlow 中的整个数组 - Python

问题描述投票：0回答：1

1个回答

最新问题

pad_sequences 更改 TensorFlow 中的整个数组 - Python

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1