为什么sklearn MLPClassifier无法预测异或?

问题描述 投票:0回答:2

理论上,只有 3 个神经元的单个隐藏层的 MLP 足以正确预测异或。有时可能无法正确收敛,但 4 个神经元是安全的选择。

这是一个示例

我尝试使用 sklearn.neural_network.MLPClassifier 重现此问题:

from sklearn import neural_network
from sklearn.metrics import accuracy_score, precision_score, recall_score
import numpy as np


x_train = np.random.uniform(-1, 1, (10000, 2))
tmp = x_train > 0
y_train = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1

model = neural_network.MLPClassifier(
    hidden_layer_sizes=(3,), n_iter_no_change=100,
    learning_rate_init=0.01, max_iter=1000
).fit(x_train, y_train)

x_test = np.random.uniform(-1, 1, (1000, 2))
tmp = x_test > 0
y_test = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1

prediction = model.predict(x_test)
print(f'Accuracy: {accuracy_score(y_pred=prediction, y_true=y_test)}')
print(f'recall: {recall_score(y_pred=prediction, y_true=y_test)}')
print(f'precision: {precision_score(y_pred=prediction, y_true=y_test)}')

我只能得到大约 0.75 的准确度,而张量流游乐场模型是完美的,你知道是什么造成了差异吗?

还尝试使用张量流:

model = tf.keras.Sequential(layers=[
    tf.keras.layers.Input(shape=(2,)),
    tf.keras.layers.Dense(4, activation='relu'),
    tf.keras.layers.Dense(1)
])

model.compile(loss=tf.keras.losses.binary_crossentropy)

x_train = np.random.uniform(-1, 1, (10000, 2))
tmp = x_train > 0
y_train = (tmp[:, 0] ^ tmp[:, 1])

model.fit(x=x_train, y=y_train)

x_test = np.random.uniform(-1, 1, (1000, 2))
tmp = x_test > 0
y_test = (tmp[:, 0] ^ tmp[:, 1])

prediction = model.predict(x_test) > 0.5
print(f'Accuracy: {accuracy_score(y_pred=prediction, y_true=y_test)}')
print(f'recall: {recall_score(y_pred=prediction, y_true=y_test)}')
print(f'precision: {precision_score(y_pred=prediction, y_true=y_test)}')

通过这个模型,我得到了与 scikit-learn 模型类似的结果......所以这不仅仅是 scikit-learn 问题 - 我是否缺少一些重要的超参数?

编辑

好的,将损失更改为均方误差而不是交叉熵,现在我在张量流示例中获得了 0.92 的精度。我猜这是 MLPClassifier 的问题?

python tensorflow machine-learning scikit-learn neural-network
2个回答
1
投票

增加学习率和/或最大迭代次数似乎可以使 sklearn 版本发挥作用。可能不同的求解器需要不同的值,而且我不清楚 tf 游乐场正在使用什么。


0
投票

我想提供一个答案,因为我偶然发现了同样的问题。对于一个研究项目(研究 XOR 解决方案的收敛性),我在 Numpy 中实现了我的网络(我自己在其中编写了代数和梯度),但在使用 sklearn.neural_network.MLPClassifier 部署相同的解决方案时,我在使用许多默认超参数解决异或问题时运气不佳。但是,当完全按照我在 Numpy 中实现的方式使用该模型时,MLPClassifier 没有出现任何问题。

所以,这里是我发现导致简单 XOR 问题收敛的超参数规范:

import numpy as np 
from sklearn.neural_network import MLPClassifier

# XOR data
X = np.array([[0,0],
              [1,0],
              [0,1],
              [1,1]])
y = np.array([
    [-1],
    [1],
    [1],
    [-1]
])

model = MLPClassifier(
    hidden_layer_sizes=(3),
    activation='logistic',#I'm not using relu activation, I'm using the logistic sigmoid
    learning_rate_init=1,
    learning_rate='constant',
    max_iter=100000,#I want to ensure convergence
    n_iter_no_change=1000,#I want to ensure convergence
    tol=1e-6,#shrinking the tol makes false early termination less likely
    batch_size='auto',
    verbose=True,
    #random_state=42,#Want to see convergence for a variety of random seeds, commenting random_state out
    solver='sgd',#Default is 'adam' which I see often not converging
).fit(X,y.reshape(len(y)))

现在,对于更大的 XOR 模拟数据集,我在这里对您的超参数进行一些修改:


import numpy as np 
from sklearn.neural_network import MLPClassifier

x_train = np.random.uniform(-1, 1, (10000, 2))
tmp = x_train > 0
y_train = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1


model = MLPClassifier(
    hidden_layer_sizes=(3), 
    max_iter=1000000,#More training epochs
    n_iter_no_change=10000,#Since convergence highly desired, raising n_iter_no_change, I have time to watch this train
    learning_rate_init=.01,
    tol=1e-6,#Lowered tol some, to prevent detecting convergence too early and terminating
    solver='sgd',#Default is 'adam' which I see often not converging
    verbose=True,#Can follow convergence at the console, and monitor things
    learning_rate='constant',
    activation='logistic',# I'm using logistic
    alpha=0,#Regularization term, I don't feel I need it here
    batch_size=len(y_train),#Your data isn't so large, let's use all of it at once
    #random_state=42
).fit(x_train, y_train)

# Test out your model
X = np.array([[-.75,-.75],
              [.75,-.75],
              [-.75,.75],
              [.75,.75]])
model.predict(X)

# array([-1,  1,  1, -1])
# Seems to have learned the XOR logic

(model.predict(x_train)==y_train).mean()
# 0.93
# Training with lower tol to lengthen training, accuracy can continue to improve significantly

因此,为了充分利用 SKLearn 类,您可能需要对超参数进行一些实验,但总的来说,算法确实有效。您可以非常灵活地调整模型,直到它们适合您的数据。

© www.soinside.com 2019 - 2024. All rights reserved.