1D CNN+LSTM 配置异常(dl4j)

问题描述 投票:0回答:1

网络配置:

protected int[] cnnStrides = {1, 2};// Strides for each CNN layer
protected int[] cnnNeurons = {72, 36}; //cnn各层的神经元数量
protected int[] rnnNeurons={64,32};//rnn各层的神经元数量
int[] cnnKernelSizes = {3, 3}; // Kernel sizes for each CNN layer  
int[] cnnPaddings = {1,1}; // Paddings for each CNN layer

public MultiLayerConfiguration getNetConf() {
    DataType dataType = DataType.FLOAT;
    NeuralNetConfiguration.Builder nncBuilder = new NeuralNetConfiguration.Builder()
            .seed(System.currentTimeMillis())
            .weightInit(WeightInit.XAVIER)
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .updater(new Adam(lrSchedule))//(lrSchedule))
            //                .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
            .dataType(dataType);

    nncBuilder.l1(l1);
    nncBuilder.l2(l2);

    NeuralNetConfiguration.ListBuilder listBuilder = nncBuilder.list();
    int nIn = featuresCount;//36
    int layerIndex = 0;
    final int cnnLayerCount = cnnNeurons.length;
   
    // Add CNN layers
    for (int i = 0; i < cnnLayerCount; i++) {
        listBuilder.layer(layerIndex, new Convolution1D.Builder()
                .kernelSize(cnnKernelSizes[i])
                .stride(cnnStrides[i])
                .padding(cnnPaddings[i])
                .nIn(nIn)
                .nOut(cnnNeurons[i])
                .activation(Activation.RELU)
                .build());
        
        nIn = cnnNeurons[i];
        ++layerIndex;
    }
    
    // Add RNN layers
    for (int i = 0; i < this.rnnNeurons.length; ++i) {
        listBuilder.layer(layerIndex, new LSTM.Builder()
                .dropOut(dropOut)
                .activation(Activation.SOFTSIGN)
                .nIn(nIn)
                .nOut(rnnNeurons[i])
                .build());

        nIn = rnnNeurons[i];
        ++layerIndex;

    }

    listBuilder.layer(layerIndex, new RnnOutputLayer.Builder(new LossMSE()).updater(new Adam(outLrSchedule))//
            .activation(Activation.IDENTITY).nIn(nIn).nOut(1).build());

    MultiLayerConfiguration conf = listBuilder.build();
    return conf;
}

线程“main”中的异常 java.lang.IllegalStateException:序列长度与 RnnOutputLayer 输入和标签不匹配:数组应为排名 3 的形状 [minibatch、size、sequenceLength] - 维度 2 不匹配(序列长度) - input= [256, 32, 30] 与标签=[256, 1, 30] 在 org.nd4j.common.base.Preconditions.throwStateEx(Preconditions.java:639) 在 org.nd4j.common.base.Preconditions.checkState(Preconditions.java:337) 在 org.deeplearning4j.nn.layers.recurrent.RnnOutputLayer.backpropGradient(RnnOutputLayer.java:59) 在 org.deeplearning4j.nn.multilayer.MultiLayerNetwork.calcBackpropGradients(MultiLayerNetwork.java:1998) 在org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:2813) 在org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:2756) 在org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:174) 在 org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:61) 在 org.deeplearning4j.optimize.Solver.optimize(Solver.java:52) 在 org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(MultiLayerNetwork.java:1767) 在 org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1688) 在 com.cq.aifocusstocks.train.RnnPredictModel.train(RnnPredictModel.java:175) 在com.cq.aifocusstocks.train.CnnLstmRegPredictor.trainModel(CnnLstmRegPredictor.java:209) 在 com.cq.aifocusstocks.train.TrainCnnLstmModel.main(TrainCnnLstmModel.java:15)

异常消息表明输出层的输入和标签之间的序列长度不匹配,但两者都是 30。问题似乎出在第一个维度(input=[256, 32, 30] 与 label=[ 256, 1, 30])。输出层的 nIn 是 32,标签不应该与输出的形状匹配吗?为什么它应该与输入形状匹配?

conv-neural-network lstm dl4j
1个回答
0
投票
listBuilder.layer(layerIndex, new RnnOutputLayer.Builder(new LossMSE()).updater(new Adam(outLrSchedule)).activation(Activation.IDENTITY).nIn(nIn).nOut(1).***dataFormat(RNNFormat.NCW)***.build());

此异常已解决,但引发了新的异常。 CNN 中的步长 2 似乎未能将序列长度减半。

Exception in thread “main” java.lang.IllegalStateException: Mismatched shapes (shape = [7680, 1], column vector shape =[15360, 1])
at org.nd4j.linalg.api.ndarray.BaseNDArray.doColumnWise(BaseNDArray.java:2398)
at org.nd4j.linalg.api.ndarray.BaseNDArray.muliColumnVector(BaseNDArray.java:2818)
at org.deeplearning4j.nn.layers.BaseOutputLayer.applyMask(BaseOutputLayer.java:332)
at org.deeplearning4j.nn.layers.BaseLayer.preOutputWithPreNorm(BaseLayer.java:336)
at org.deeplearning4j.nn.layers.BaseLayer.preOutput(BaseLayer.java:296)
at org.deeplearning4j.nn.layers.recurrent.RnnOutputLayer.preOutput2d(RnnOutputLayer.java:119)
at org.deeplearning4j.nn.layers.BaseOutputLayer.backpropGradient(BaseOutputLayer.java:144)
at org.deeplearning4j.nn.layers.recurrent.RnnOutputLayer.backpropGradient(RnnOutputLayer.java:72)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.calcBackpropGradients(MultiLayerNetwork.java:1998)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:2813)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:2756)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:174)
at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:61)
at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(MultiLayerNetwork.java:1767)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1688)
at com.cq.aifocusstocks.train.RnnPredictModel.train(RnnPredictModel.java:175)
© www.soinside.com 2019 - 2024. All rights reserved.