R中的感知器未收敛

Question

我试图更好地理解神经网络，所以我试图从R中从头开始实现一个简单的感知器。我知道这是非常低效的，因为有很多库对此进行了很好的优化，但是我的目标是了解基础知识更好的神经网络，并朝着更复杂的模型前进。

我创建了一些具有非常简单的线性决策边界的人工测试数据，并将其分为训练集和测试集。然后，我对训练数据进行了逻辑回归，并从测试集中检查了预测结果，并获得了+ 99％的准确率，考虑到数据的简单性质，这是可以预期的。然后，我尝试使用2个输入，1个神经元，1000次迭代，0.1的学习率和S形激活函数来实现感知器。

我希望获得与逻辑回归模型非常相似的准确性，但是我的结果差很多（训练集中正确分类的大约70％）。所以我肯定做错了在最初的几次迭代之后，预测似乎只会变得更好，然后只是在特定值附近来回移动（我尝试了许多不同的学习率，但均未获得成功）。我附上了我的脚本，我非常感谢您的任何建议！我认为问题出在误差的计算或重量调整上，但我不能指责...

### Reproducible Example for StackOverflow


#### Setup

# loading libraries
library(data.table)

#remove scientifc notation
options(scipen = 999)

# setting seed for random number generation
seed <- 123




#### Selfmade Test Data

# input points
x1 <- runif(10000,-100,100)
x2 <- runif(10000,-100,100)

# setting decision boundary to create output
output <- vector()
output[0.5*x1 + -1.2*x2 >= 50] <- 0
output[0.5*x1 + -1.2*x2 < 50] <- 1

# combining to dataframe
points <- cbind.data.frame(x1,x2,output)

# plotting all data points
plot(points$x1,points$x2, col = as.factor(points$output), main = "Self-created data", xlab = "x1",ylab = "x2")

# split into test and training sets
trainsize = 0.2
set.seed(seed)
train_rows <- sample(1:dim(points)[1], size = trainsize * dim(points)[1])
train <- points[train_rows,]
test <- points[-c(train_rows),]

# plotting training set only
plot(train$x1,train$x2, col = as.factor(train$output), main = "Self-created data (training set)", xlab = "x1",ylab = "x2")





#### Approaching the problem with logistic regression

# building model
train_logit <- glm(output ~ x1 + x2, data = train, family = "binomial", maxit = 10000)
summary(train_logit)

# testing performance in training set
table(round(train_logit$fitted.values) == train$output)

# testing performance of train_logit model in test set
table(test$output == round(predict(train_logit,test[,c(1,2)], type = "response")))

# We get 100% accuracy in the training set and near 100% accuracy in the test set









#### Approaching Problem with a Perceptron from scratch


# setting inputs, outputs and weights
inputs <- as.matrix(train[,c(1,2)])
output <- as.matrix(train[,3])
set.seed(123456)
weights <- as.matrix(runif(dim(inputs)[2],-1,1))


## Defining activation function + derivative

# defining sigmoid and it's derivative
sigmoid <- function(x) {1 / (1 + exp(-x))}
sig_dir <- function(x){sigmoid(x)*(1 - sigmoid(x))}


## Perceptron nitial Settings
bias <- 1

# number of iterations
iterations <- 1000

# setting learning rate
alpha <- 0.1



## Perceptron

# creating vectors for saving results per iteration
weights_list <- list()
weights_list[[1]] <- weights
errors_vec <- vector()
outputs_vec <- vector()

# saving results across iterations
weights_list_all <- list()
outputs_list <- list()
errors_list <- list()


# looping through the backpropagation algorithm "iteration" # times
for (j in 1:iterations) {

  # Loop for backpropagation with updating weights after every datapoint
  for (i in 1:dim(train)[1]) {

    # taking the weights from the last iteration of the outer loop as a starting point
    if (j > 1) {

      weights_list[[1]] <- weights

    }

    # Feed Forward (Should we really round this?!)
    output_pred <- round(sigmoid(sum(inputs[i,] * as.numeric(weights)) + bias))
    error <- output_pred - output[i]

    # Backpropagation (Do I need the sigmoid derivative AND a learning rate? Or should I only take one of them?)
    weight_adjustments <- inputs[i,] * (error * sig_dir(output_pred)) * alpha
    weights <- weights - weight_adjustments

    # saving progress for later plots
    weights_list[[i + 1]] <- weights
    errors_vec[i] <- error
    outputs_vec[[i]] <- output_pred

  }

  # saving results for each iteration
  weights_list_all[[j]] <- weights_list
  outputs_list[[j]] <- outputs_vec
  errors_list[[j]] <- errors_vec

}



#### Formatting Diagnostics for easier plotting

# implementing empty list to transform weightslist
WeightList <- list()

# collapsing individual weightslist into datafames
for (i in 1:iterations) {

  WeightList[[i]] <- t(data.table::rbindlist(weights_list_all[i]))

}

# pasting dataframes together
WeightFrame <- do.call(rbind.data.frame, WeightList)
colnames(WeightFrame) <- paste("w",1:dim(WeightFrame)[2], sep = "")

# pasting dataframes together
ErrorFrame <- do.call(rbind.data.frame, errors_list)
OutputFrame <- do.call(rbind.data.frame, outputs_list)




##### Plotting Results


# Development of Mean Error per iteration
plot(rowMeans(abs(ErrorFrame)),
     type = "l",
     xlab = "Sum of absolute Error terms")

# Development of Weights over time
plot(WeightFrame$w1, type = "l",xlim = c(1,dim(train)[1]), ylim = c(min(WeightFrame),max(WeightFrame)), ylab = "Weights", xlab = "Iterations")
lines(WeightFrame$w2, col = "green")
# lines(WeightFrame$w3, col = "blue")
# lines(WeightFrame$w4, col = "red")
# lines(WeightFrame$w5, col = "orange")
# lines(WeightFrame$w6, col = "cyan")
# lines(WeightFrame$w7, col = "magenta")

# Empty vector for number of correct categorizations per iteration
NoCorr <- vector()

# Computing percentage of correct predictions per iteration
colnames(OutputFrame) <- paste("V",1:dim(OutputFrame)[2], sep = "")
Output_mat <- as.matrix(OutputFrame)

for (i in 1:iterations) {

  NoCorr[i] <- sum(output == Output_mat[i,]) / nrow(train)

}

# plotting number of correct predictions per iteration
plot(NoCorr, type = "l")


# Performance in training set after last iteration
table(output,round(OutputFrame[iterations,]))

Answer 1

首先，欢迎来到神经网络世界：)。

[第二，我想向您推荐一篇很棒的文章，我个人用来更好地了解回溯和整个NN学习内容：https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/。有时可能会有点困难，对于一般的实现，我认为遵循NN书中的伪代码要容易得多。但是，了解这篇文章的内容非常好！

第三，我希望能解决您的问题:)您已经对是否应该真正舍弃该output_pred进行了评论。是的，您应该..如果您想使用该output_pred进行预测！但是，如果您想将其用于学习，那通常是不好的！这样做的原因是，如果将其舍入进行学习，那么将输出与目标输出1从0.51舍入为1的结果将不会学习任何东西，因为该输出与目标相同，因此是完美的。但是，0.99的预测要比0.51好得多，因此肯定有一些东西要学习！

我不确定100％是否可以解决您的所有问题（不是R程序员），并使您的准确率达到99％，但它应该可以解决其中的一些问题，希望直觉也很清楚：）

R中的感知器未收敛

问题描述投票：2回答：1

1个回答

最新问题

R中的感知器未收敛

问题描述 投票：2回答：1

1个回答

最新问题

问题描述投票：2回答：1