无法使用原始数据集（tf.data.dataset）训练和模拟我的张量流模型，但在我分割它后它可以工作！？两者处理相同吗？

Question

我构建了一个用于训练 2 头神经网络的数据集，第一个使用 lstm，第二个使用简单感知器。

我的数据集有两种处理方式，将一个版本拆分为训练集和测试集，第二个版本不拆分以在最后执行完整数据的训练测试和模拟。

这是我的代码：

# fonction to split innitial dataset into train and test dataset:
def is_test(x, _):
    return x % int(self.val_split * 100) == 0

def is_train(x, y):
    return not is_test(x, y)

recover = lambda x, y: y
full_dataset

# Split the dataset for training.
test_set = full_dataset.enumerate().filter(is_test).map(recover)

# Split the dataset for testing/validation.
trainning_set = full_dataset.enumerate().filter(is_train).map(recover)

test_set = test_set.batch(batch_size).cache().prefetch(2)
trainning_set = trainning_set.batch(batch_size).cache().prefetch(2)

full_dataset = full_dataset.batch(batch_size).cache().prefetch(2)

检查每个数据集：

full_dataset:
<_PrefetchDataset element_spec=({'input1': TensorSpec(shape=(None, None, 3), dtype=tf.float32, name=None), 'input2': TensorSpec(shape=(None, 13), dtype=tf.float32, name=None)}, TensorSpec(shape=(None,), dtype=tf.float32, name=None))>

test_set: 
<_PrefetchDataset element_spec=({'input1': TensorSpec(shape=(None, None, 3), dtype=tf.float32, name=None), 'input2': TensorSpec(shape=(None, 13), dtype=tf.float32, name=None)}, TensorSpec(shape=(None,), dtype=tf.float32, name=None))>

trainning_set:
<_PrefetchDataset element_spec=({'input1': TensorSpec(shape=(None, None, 3), dtype=tf.float32, name=None), 'input2': TensorSpec(shape=(None, 13), dtype=tf.float32, name=None)}, TensorSpec(shape=(None,), dtype=tf.float32, name=None))>

现在为什么用 split set train 来训练我的模型

model.fit(trainning_set, validation_data=data.test_set)

但是用所有数据训练我的模型不起作用并产生 nan？！！

model.fit(full_dataset)

Epoch 1/5
160/160 - 2s - loss: nan - nash_sutcliffe: nan - 2s/epoch - 12ms/step
Epoch 2/5
160/160 - 0s - loss: nan - nash_sutcliffe: nan - 319ms/epoch - 2ms/step
...

我做了一些搜索和测试，但找不到这两个版本的数据集有什么不同，以及为什么一个版本有效而另一个版本无效！？

这里是批处理之前我的 test_set 和 full_dataset 的样本...如您所见，除了 test_set 之外，它们是相同的，input1 的值更四舍五入（？！）但仍然是 float32

for inputs, targets in test_set.take(1):
            print("Feature:", inputs)
            print("Label:", targets)

Feature: {'input1': <tf.Tensor: shape=(5, 3), dtype=float32, numpy=
array([[ 0.  , 16.12,  0.  ],
       [ 0.  , 17.42,  0.57],
       [ 0.  , 11.36, 13.97],
       [ 0.  , 10.55,  0.96],
       [ 0.  , 11.56,  0.24]], dtype=float32)>, 'input2': <tf.Tensor: shape=(13,), dtype=float32, numpy=
array([1.4391040e+02, 5.4850894e+03, 8.7901926e+00, 3.6657768e+01,
       5.4554661e+01, 9.5567673e+01, 2.0000000e+00, 5.8438915e+01,
       2.0383540e+03, 6.7381866e+01, 5.6437737e+01, 4.7759323e+00,
       0.0000000e+00], dtype=float32)>}
Label: tf.Tensor(0.91, shape=(), dtype=float32)

for inputs, targets in full_dataset.take(1):
            print("Feature:", inputs)
            print("Label:", targets)

Feature: {'input1': <tf.Tensor: shape=(5, 3), dtype=float32, numpy=
array([[0.000e+00, 9.860e+00, 0.000e+00],
       [0.000e+00, 1.308e+01, 0.000e+00],
       [0.000e+00, 1.433e+01, 1.000e-02],
       [0.000e+00, 1.630e+01, 0.000e+00],
       [0.000e+00, 1.644e+01, 0.000e+00]], dtype=float32)>, 'input2': <tf.Tensor: shape=(13,), dtype=float32, numpy=
array([1.4391040e+02, 5.4850894e+03, 8.7901926e+00, 3.6657768e+01,
       5.4554661e+01, 9.5567673e+01, 2.0000000e+00, 5.8438915e+01,
       2.0383540e+03, 6.7381866e+01, 5.6437737e+01, 4.7759323e+00,
       0.0000000e+00], dtype=float32)>}
Label: tf.Tensor(0.79, shape=(), dtype=float32)

Answer 1

（从评论复制）

你是否尝试过在分割集上使用更多的纪元？对我来说，看起来两者都应该朝向

nan

值，因为您使用的是未缩放的数据，并且我假设模型中存在类似于

ReLU

激活函数的东西。

full_dataset

应该更快地到达

nan's

，因为每个 epoch 有更多的数据，因此相同批量大小的梯度步骤更多。每个时期更多的梯度更新会导致网络中的权重更快地爆炸。

解决方案：在数据上使用StandardScaler之类的东西（并且不要忘记首先将数据拆分为训练和测试，并且仅适合训练数据。）

无法使用原始数据集（tf.data.dataset）训练和模拟我的张量流模型，但在我分割它后它可以工作！？两者处理相同吗？

问题描述投票：0回答：1

1个回答

最新问题

无法使用原始数据集（tf.data.dataset）训练和模拟我的张量流模型，但在我分割它后它可以工作！？两者处理相同吗？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1