反向传播在C ++神经网络中提供奇怪的值

问题描述 投票:0回答:1

[我正在尝试使用我从头开始用C ++写的神经网络来解决虹膜数据集,该神经网络将150行分为3种不同的花朵(每列4列,然后将第五种花朵类型转化为0, 1或2。

问题:每当我运行网络时,它将经过90行的测试集,并分为3种不同的花朵(30、30、30)。每次运行一个纪元时,它都会显示出非常高的输出值,例如(0.99,0.99,0.98)。这样做会持续几个纪元,然后最终降低到更合理的值。但是,当到达下一个纪元时,我说的是50个纪元,正确的花的值将越来越接近于1.00(对于每朵花),然后对下一个花和之后的花执行相同的操作,那么它将重新开始该过程。并非从接近1.0开始就表明它已经学习并且权重已适当调整。

控制台运行时期的输出(先运行forward_prop(),back_prop()然后运行update_weights()),在每个时期之后,它将打印出网络的输出值。在纪元末尾打印意味着实际值始终为{0, 0, 1}。当我运行网络时,我运行了1000次,输出值在15之后的每个时期都没有改变。为什么这样做?

File parsed, weights and bias randomized

Epoch 1

0.97 0.97 0.99 Epoch 2

0.93 0.94 0.99 Epoch 3

0.64 0.70 0.99 Epoch 4

0.27 0.36 0.99 Epoch 5

0.22 0.31 0.99 Epoch 6

0.21 0.30 0.99 Epoch 7

0.21 0.30 0.98 Epoch 8

0.21 0.30 0.98 Epoch 9

0.21 0.30 0.96 Epoch 10

0.21 0.30 0.88 Epoch 11

0.21 0.30 0.66 Epoch 12

0.21 0.30 0.56 Epoch 13

0.21 0.30 0.54 Epoch 14

0.21 0.30 0.53 Epoch 15

0.21 0.30 0.53 completed successfully

结束控制台输出。


时期9的示例

0.21 0.30 0.98
0.21 0.30 0.98
0.22 0.29 0.98
0.23 0.29 0.98
0.24 0.28 0.98
0.25 0.28 0.98
0.25 0.27 0.98
0.26 0.27 0.98 
0.27 0.27 0.98
0.28 0.26 0.98
0.29 0.26 0.98
0.30 0.26 0.98
0.31 0.26 0.98
0.32 0.25 0.98
0.34 0.25 0.98
0.35 0.24 0.98
0.36 0.24 0.98
0.37 0.24 0.98 
0.38 0.24 0.98
0.40 0.23 0.98
0.41 0.23 0.98
0.42 0.23 0.98
0.43 0.23 0.98
0.44 0.22 0.98
0.45 0.22 0.98
0.46 0.22 0.98 
0.48 0.22 0.98
0.49 0.22 0.98
0.50 0.21 0.98 
0.51 0.21 0.98
0.53 0.20 0.98
0.52 0.21 0.98
0.50 0.22 0.98
0.49 0.23 0.98
0.48 0.24 0.98
0.47 0.24 0.98
0.46 0.25 0.98
0.45 0.26 0.98
0.44 0.27 0.98 
0.43 0.28 0.98
0.42 0.29 0.98
0.42 0.30 0.98
0.41 0.32 0.98 
0.40 0.33 0.98
0.39 0.34 0.98
0.38 0.35 0.98
0.38 0.36 0.98
0.37 0.37 0.98
0.36 0.38 0.98
0.35 0.40 0.98
0.35 0.41 0.98
0.34 0.42 0.98
0.34 0.43 0.98
0.33 0.44 0.98
0.32 0.46 0.98 
0.32 0.47 0.98
0.31 0.48 0.98
0.31 0.49 0.98 
0.30 0.50 0.98
0.30 0.51 0.97
0.30 0.52 0.98
0.29 0.51 0.98
0.29 0.50 0.98
0.28 0.49 0.98
0.28 0.48 0.98
0.27 0.47 0.98
0.27 0.46 0.97 
0.27 0.45 0.98
0.26 0.44 0.98
0.26 0.43 0.98
0.26 0.42 0.98
0.25 0.41 0.98
0.25 0.40 0.98
0.25 0.40 0.98
0.24 0.39 0.98 
0.24 0.38 0.98
0.24 0.37 0.98
0.24 0.37 0.98
0.23 0.36 0.98
0.23 0.35 0.98 
0.23 0.35 0.98
0.23 0.34 0.98
0.22 0.33 0.98
0.22 0.33 0.98
0.22 0.32 0.98
0.22 0.32 0.98
0.21 0.31 0.98
0.21 0.31 0.98
0.21 0.30 0.98 
0.21 0.30 0.98 Epoch 9

因此,在时代9中,前30行的实际值为{1,0,0},然后接下来的30行的实际值为{0,1,0},最后最后30行的实际值为{0 ,0,1}。查看每行数据的距离如何越来越近,但最后一行保持不变,而在所有时期都保持不变。这很奇怪,我不确定为什么会这样做。


所以程序的基本结构是:

[main()执行,声明和初始化带有输入,隐藏和输出层的神经网络类。

调用train()然后执行epoch(),它以循环的方式调用train时指定的时间循环。

[epoch()本身运行forward_prop()back_prop(),最后是update_network(),还有一些变量,例如数组,用于输出的期望值和实际值。

矢量偏差,值,权重和误差都分别持有网络的值,我发现这对于可读性更好。权重向量的第一层或位置[0]为空,输入值使用隐藏层中的权重,而隐藏层使用输出层中的权重。

每个权重是等于前一层中节点数量的权重向量,权重向量中的位置[0]用于前一层中位置[0]的节点。


#include <iostream>
#include <cstdlib>
#include <iomanip>
#include <cmath>
#include <fstream>
#include <sstream>
#include <vector>
#include <array>
#include <string>
#include <numeric>

class Neural_Network
{
private:
    std::vector<std::array<double, 4>> training_set; // 30 setosa -> 30 versicolor -> 30 virginica
    std::vector<std::vector<double>> values, bias, errors;
    std::vector<std::vector<std::vector<double>>> weights;
    size_t net_size = 0;
    double dot_val(std::vector<double> val, std::vector<double> weights);
    double sigmoid(const double num);
    double random_number();
    double transfer_derivitive(double num);
    void initialize(std::vector<size_t> layers);
    void forward_prop(std::vector<double>& expected);
    void back_prop(std::vector<double> expected);
    void update_network(double l_rate);

public:
    Neural_Network(const std::vector<std::array<double, 4>>& data);
    ~Neural_Network() = default;
    void train(size_t epochs = 1);
    void display();
};

Neural_Network::Neural_Network(const std::vector<std::array<double, 4>>& data) : training_set{ data }
{
    initialize({ 4, 6, 3 });
}

double Neural_Network::dot_val(std::vector<double> val, std::vector<double> weights)
{
    return std::inner_product(val.begin(), val.end(), weights.begin(), 0.0);
}

double Neural_Network::sigmoid(const double num)
{
    return (1 / (1 + exp(-num)));
}

double Neural_Network::random_number()
{
    return (double)rand() / (double)RAND_MAX;
}

double Neural_Network::transfer_derivitive(double num)
{
    return num * (1 - num);
}

void Neural_Network::display()
{
    std::cout << std::fixed << std::setprecision(2) << "values:\n";
    for (size_t i = 0; i < values.size(); ++i)
    {
        std::cout << "layer " << i << "\n[ ";
        for (size_t j = 0; j < values[i].size(); ++j)
            std::cout << values.at(i).at(j) << " ";
        std::cout << " ]\n";
    }
}

void Neural_Network::initialize(std::vector<size_t> layers)
{
    for (size_t i = 0; i < layers.size(); ++i)
    {
        std::vector<double> v{}, b{}, e{};
        std::vector<std::vector<double>> w{};
        //initializing the nodes in the layers
        for (size_t j = 0; j < layers.at(i); ++j)
        {
            v.push_back(0);
            b.push_back(random_number());
            e.push_back(1);
            std::vector<double> inner_w{};
            if (i != 0)                                    // checking if the current layer is the input
                for (size_t k = 0; k < layers.at(i - 1); ++k) // adding weights to the current layer to the amount of nodes in the next layer
                    inner_w.push_back(random_number());    // adding a weight to the current layer for a node in the next layer
            w.push_back(inner_w);
        }
        values.push_back(v);
        bias.push_back(b);
        errors.push_back(e);
        weights.push_back(w);
        ++net_size;
    }
    std::cout << "initialize network success" << std::endl;
}

void Neural_Network::train(size_t epoch_count)
{
    const size_t count = epoch_count;
    while (epoch_count > 0)
    {
        std::cout << "\nEpoch " << 1 + (count - epoch_count) << std::endl;
        for (size_t i = 0; i < 90; ++i)
        {
            std::vector<double> expected{ 0, 0, 0 };
            if (i < 30)
                expected[0] = 1;
            else if (i < 60)
                expected[1] = 1;
            else if (i < 90)
                expected[2] = 1;
            for (size_t j = 0; j < values[0].size(); ++j) // Initialize input layer values
                values.at(0).at(j) = training_set.at(i).at(j);        // value[0] is the input layer, j is the node
            forward_prop(expected);
            back_prop(expected);
            update_network(0.05);
        }
        display();
        --epoch_count;
    }
}

void Neural_Network::forward_prop(std::vector<double>& expected)
{
    for (size_t i = 1; i < net_size - 1; ++i)                                           // looping through every layer except the first and last
        for (size_t j = 0; j < values.at(i).size(); ++j)                                   // looping through every node in the current non input/output layer
            values.at(i).at(j) = sigmoid(dot_val(values.at(i - 1), weights.at(i).at(j)) + bias.at(i).at(j)); // assigning node j of layer i a sigmoided val that is the dotval + the associated bias
    for (size_t i = 0; i < values.at(net_size - 1).size(); ++i)                            // looping through the ouptut layer
        values.at(net_size - 1).at(i) = sigmoid(dot_val(values.at(net_size - 2), weights.at(net_size - 1).at(i)) + bias.at(net_size - 1).at(i));
}

void Neural_Network::back_prop(std::vector<double> expected) // work backwards from the output layer
{
    std::vector<double> output_errors{};
    for (size_t i = 0; i < errors.at(net_size - 1).size(); ++i) // looping through the output layer
    {
        output_errors.push_back(expected.at(i) - values.at(net_size - 1).at(i));
        errors.at(net_size - 1).at(i) = output_errors.at(i) * transfer_derivitive(values.at(net_size - 1).at(i));
    }                                         // output layer finished
    for (size_t i = net_size - 2; i > 0; i--) // looping through the non output layers backwards
    {
        std::vector<double> layer_errors{};
        for (size_t j = 0; j < errors.at(i).size(); ++j) // looping through the current layer's nodes
        {
            double error = 0;
            for (size_t k = 0; k < weights.at(i + 1).size(); ++k) // looping through the current set of weights
                error += errors.at(i).at(j) * weights.at(i + 1).at(k).at(j);
            layer_errors.push_back(error);
        }
        for (size_t j = 0; j < layer_errors.size(); ++j)
            errors.at(i).at(j) = layer_errors.at(j) * transfer_derivitive(values.at(i).at(j));
    }
}

void Neural_Network::update_network(double l_rate)
{
    for (size_t i = 1; i < net_size; ++i)
    {
        for (size_t j = 0; j < weights.at(i).size(); ++j)
        {
            for (size_t k = 0; k < weights.at(i).at(j).size(); ++k)
                weights.at(i).at(j).at(k) += l_rate * errors.at(i).at(j) * values.at(i - 1).at(j);
            bias.at(i).at(j) += l_rate * errors.at(i).at(j);
        }
    }
}

int main()
{
    std::vector<std::array<double, 4>> data = {
        {5.1, 3.5, 1.4, 0.2},
        {4.9, 3, 1.4, 0.2},
        {4.7, 3.2, 1.3, 0.2},
        {4.6, 3.1, 1.5, 0.2},
        {5, 3.6, 1.4, 0.2},
        {5.4, 3.9, 1.7, 0.4},
        {4.6, 3.4, 1.4, 0.3},
        {5, 3.4, 1.5, 0.2},
        {4.4, 2.9, 1.4, 0.2},
        {4.9, 3.1, 1.5, 0.1},
        {5.4, 3.7, 1.5, 0.2},
        {4.8, 3.4, 1.6, 0.2},
        {4.8, 3, 1.4, 0.1},
        {4.3, 3, 1.1, 0.1},
        {5.8, 4, 1.2, 0.2},
        {5.7, 4.4, 1.5, 0.4},
        {5.4, 3.9, 1.3, 0.4},
        {5.1, 3.5, 1.4, 0.3},
        {5.7, 3.8, 1.7, 0.3},
        {5.1, 3.8, 1.5, 0.3},
        {5.4, 3.4, 1.7, 0.2},
        {5.1, 3.7, 1.5, 0.4},
        {4.6, 3.6, 1, 0.2},
        {5.1, 3.3, 1.7, 0.5},
        {4.8, 3.4, 1.9, 0.2},
        {5, 3, 1.6, 0.2},
        {5, 3.4, 1.6, 0.4},
        {5.2, 3.5, 1.5, 0.2},
        {5.2, 3.4, 1.4, 0.2},
        {4.7, 3.2, 1.6, 0.2},
        {7, 3.2, 4.7, 1.4},
        {6.4, 3.2, 4.5, 1.5},
        {6.9, 3.1, 4.9, 1.5},
        {5.5, 2.3, 4, 1.3},
        {6.5, 2.8, 4.6, 1.5},
        {5.7, 2.8, 4.5, 1.3},
        {6.3, 3.3, 4.7, 1.6},
        {4.9, 2.4, 3.3, 1},
        {6.6, 2.9, 4.6, 1.3},
        {5.2, 2.7, 3.9, 1.4},
        {5, 2, 3.5, 1},
        {5.9, 3, 4.2, 1.5},
        {6, 2.2, 4, 1},
        {6.1, 2.9, 4.7, 1.4},
        {5.6, 2.9, 3.6, 1.3},
        {6.7, 3.1, 4.4, 1.4},
        {5.6, 3, 4.5, 1.5},
        {5.8, 2.7, 4.1, 1},
        {6.2, 2.2, 4.5, 1.5},
        {5.6, 2.5, 3.9, 1.1},
        {5.9, 3.2, 4.8, 1.8},
        {6.1, 2.8, 4, 1.3},
        {6.3, 2.5, 4.9, 1.5},
        {6.1, 2.8, 4.7, 1.2},
        {6.4, 2.9, 4.3, 1.3},
        {6.6, 3, 4.4, 1.4},
        {6.8, 2.8, 4.8, 1.4},
        {6.7, 3, 5, 1.7},
        {6, 2.9, 4.5, 1.5},
        {5.7, 2.6, 3.5, 1},
        {6.3, 3.3, 6, 2.5},
        {5.8, 2.7, 5.1, 1.9},
        {7.1, 3, 5.9, 2.1},
        {6.3, 2.9, 5.6, 1.8},
        {6.5, 3, 5.8, 2.2},
        {7.6, 3, 6.6, 2.1},
        {4.9, 2.5, 4.5, 1.7},
        {7.3, 2.9, 6.3, 1.8},
        {6.7, 2.5, 5.8, 1.8},
        {7.2, 3.6, 6.1, 2.5},
        {6.5, 3.2, 5.1, 2},
        {6.4, 2.7, 5.3, 1.9},
        {6.8, 3, 5.5, 2.1},
        {5.7, 2.5, 5, 2},
        {5.8, 2.8, 5.1, 2.4},
        {6.4, 3.2, 5.3, 2.3},
        {6.5, 3, 5.5, 1.8},
        {7.7, 3.8, 6.7, 2.2},
        {7.7, 2.6, 6.9, 2.3},
        {6, 2.2, 5, 1.5},
        {6.9, 3.2, 5.7, 2.3},
        {5.6, 2.8, 4.9, 2},
        {7.7, 2.8, 6.7, 2},
        {6.3, 2.7, 4.9, 1.8},
        {6.7, 3.3, 5.7, 2.1},
        {7.2, 3.2, 6, 1.8},
        {6.2, 2.8, 4.8, 1.8},
        {6.1, 3, 4.9, 1.8},
        {6.4, 2.8, 5.6, 2.1},
        {7.2, 3, 5.8, 1.6}
    };

    Neural_Network network{ data };
    network.train(1);
    return 0;
}

编辑以使用.at()代替[]来访问程序中的std :: vector

我希望我把一切都弄清楚,如果不能让我知道。

注意:我有这个stackoverflow问题,有人告诉我应该将其移动到codereview.stackexchange,然后他们告诉我我应该移动它再次回到stackoverflow,同时用更多来重新定义我的问题详情。请不要告诉我第三次提出这个问题。如果我的询问方式有问题,请给我一个机会进行更改或添加信息,以便获得帮助,请谢谢

c++ machine-learning neural-network backpropagation
1个回答
1
投票

一个明显的错误是在dot_val中:

double Neural_Network::dot_val(std::vector<double> val,std::vector<double> weights)
{
    double output;  // <-- This is uninitialized
    for (size_t i = 0; i < weights.size(); ++i)
        output += val[i] * weights[i];
    return output;  // <-- Who knows what this will be
}

您正在使用未初始化的变量。您可以将output初始化为0,也可以使用std::inner_product

#include <numeric>
//...
double Neural_Network::dot_val(std::vector<double> val,std::vector<double> weights)
{
    return std::inner_product(val.begin(), val.end(), weights.begin(), 0.0);
}
© www.soinside.com 2019 - 2024. All rights reserved.