理论上,只有 3 个神经元的单个隐藏层的 MLP 足以正确预测异或。有时可能无法正确收敛,但 4 个神经元是安全的选择。
这是一个示例
我尝试使用 sklearn.neural_network.MLPClassifier 重现此问题:
from sklearn import neural_network
from sklearn.metrics import accuracy_score, precision_score, recall_score
import numpy as np
x_train = np.random.uniform(-1, 1, (10000, 2))
tmp = x_train > 0
y_train = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1
model = neural_network.MLPClassifier(
hidden_layer_sizes=(3,), n_iter_no_change=100,
learning_rate_init=0.01, max_iter=1000
).fit(x_train, y_train)
x_test = np.random.uniform(-1, 1, (1000, 2))
tmp = x_test > 0
y_test = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1
prediction = model.predict(x_test)
print(f'Accuracy: {accuracy_score(y_pred=prediction, y_true=y_test)}')
print(f'recall: {recall_score(y_pred=prediction, y_true=y_test)}')
print(f'precision: {precision_score(y_pred=prediction, y_true=y_test)}')
我只能得到大约 0.75 的准确度,而张量流游乐场模型是完美的,你知道是什么造成了差异吗?
还尝试使用张量流:
model = tf.keras.Sequential(layers=[
tf.keras.layers.Input(shape=(2,)),
tf.keras.layers.Dense(4, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss=tf.keras.losses.binary_crossentropy)
x_train = np.random.uniform(-1, 1, (10000, 2))
tmp = x_train > 0
y_train = (tmp[:, 0] ^ tmp[:, 1])
model.fit(x=x_train, y=y_train)
x_test = np.random.uniform(-1, 1, (1000, 2))
tmp = x_test > 0
y_test = (tmp[:, 0] ^ tmp[:, 1])
prediction = model.predict(x_test) > 0.5
print(f'Accuracy: {accuracy_score(y_pred=prediction, y_true=y_test)}')
print(f'recall: {recall_score(y_pred=prediction, y_true=y_test)}')
print(f'precision: {precision_score(y_pred=prediction, y_true=y_test)}')
通过这个模型,我得到了与 scikit-learn 模型类似的结果......所以这不仅仅是 scikit-learn 问题 - 我是否缺少一些重要的超参数?
编辑
好的,将损失更改为均方误差而不是交叉熵,现在我在张量流示例中获得了 0.92 的精度。我猜这是 MLPClassifier 的问题?
增加学习率和/或最大迭代次数似乎可以使 sklearn 版本发挥作用。可能不同的求解器需要不同的值,而且我不清楚 tf 游乐场正在使用什么。
我想提供一个答案,因为我偶然发现了同样的问题。对于一个研究项目(研究 XOR 解决方案的收敛性),我在 Numpy 中实现了我的网络(我自己在其中编写了代数和梯度),但在使用 sklearn.neural_network.MLPClassifier 部署相同的解决方案时,我在使用许多默认超参数解决异或问题时运气不佳。但是,当完全按照我在 Numpy 中实现的方式使用该模型时,MLPClassifier 没有出现任何问题。
所以,这里是我发现导致简单 XOR 问题收敛的超参数规范:
import numpy as np
from sklearn.neural_network import MLPClassifier
# XOR data
X = np.array([[0,0],
[1,0],
[0,1],
[1,1]])
y = np.array([
[-1],
[1],
[1],
[-1]
])
model = MLPClassifier(
hidden_layer_sizes=(3),
activation='logistic',#I'm not using relu activation, I'm using the logistic sigmoid
learning_rate_init=1,
learning_rate='constant',
max_iter=100000,#I want to ensure convergence
n_iter_no_change=1000,#I want to ensure convergence
tol=1e-6,#shrinking the tol makes false early termination less likely
batch_size='auto',
verbose=True,
#random_state=42,#Want to see convergence for a variety of random seeds, commenting random_state out
solver='sgd',#Default is 'adam' which I see often not converging
).fit(X,y.reshape(len(y)))
现在,对于更大的 XOR 模拟数据集,我在这里对您的超参数进行一些修改:
import numpy as np
from sklearn.neural_network import MLPClassifier
x_train = np.random.uniform(-1, 1, (10000, 2))
tmp = x_train > 0
y_train = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1
model = MLPClassifier(
hidden_layer_sizes=(3),
max_iter=1000000,#More training epochs
n_iter_no_change=10000,#Since convergence highly desired, raising n_iter_no_change, I have time to watch this train
learning_rate_init=.01,
tol=1e-6,#Lowered tol some, to prevent detecting convergence too early and terminating
solver='sgd',#Default is 'adam' which I see often not converging
verbose=True,#Can follow convergence at the console, and monitor things
learning_rate='constant',
activation='logistic',# I'm using logistic
alpha=0,#Regularization term, I don't feel I need it here
batch_size=len(y_train),#Your data isn't so large, let's use all of it at once
#random_state=42
).fit(x_train, y_train)
# Test out your model
X = np.array([[-.75,-.75],
[.75,-.75],
[-.75,.75],
[.75,.75]])
model.predict(X)
# array([-1, 1, 1, -1])
# Seems to have learned the XOR logic
(model.predict(x_train)==y_train).mean()
# 0.93
# Training with lower tol to lengthen training, accuracy can continue to improve significantly
因此,为了充分利用 SKLearn 类,您可能需要对超参数进行一些实验,但总的来说,算法确实有效。您可以非常灵活地调整模型,直到它们适合您的数据。