Scikit Learn Logistic回归中的预测逆是正确的

Question

在下面的最小可再现数据集中，我将数据集拆分为训练和测试数据集，使用scikit学习和预测y基于x_test对训练数据集进行逻辑回归。

然而，y_pred或y预测只有在反演（例如0 = 1和1 = 0）时才是正确的：1 - y_pred。为什么会这样？我无法弄清楚它是否与x的缩放有关（我曾尝试使用和不使用StandardScaler），与逻辑回归或精确分数计算相关的东西。

在我的大数据集中，即使使用不同的种子作为随机状态也是如此。我也试过this Logistic Regression同样的结果。

如@Nester所指出的编辑，它在没有标准缩放器的情况下可用于此最小数据集。更大的数据集可用here，standardScaler在这个更大的数据集上什么都不做，我将保留OP较小的数据集，因为它可能有助于解释问题。

# imports
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

# small dataset
Y = [1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0]
X =[[0.38373581],[0.56824121],[0.39078066],[0.41532221],[0.3996311 ]
    ,[0.3455455 ],[0.55867358],[0.51977073],[0.51937625],[0.48718916]
    ,[0.37019272],[0.49478954],[0.37277804],[0.6108499 ],[0.39718093]
    ,[0.33776591],[0.36384773],[0.50663667],[0.3247984 ]]


x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.15, random_state=42, stratify=Y)
clf = make_pipeline(StandardScaler(), LogisticRegression())
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)

y_pred = 1 - y_pred #          <- why?

accuracy_score(y_test,y_pred)
1.0

更高的数据集准确度：

accuracy_score(y_test,y_pred)
0.7  # if inversed

谢谢阅读

Answer 1

X和Y根本没有任何关系。因此，该模型表现不佳。有理由说1-pred表现更好。如果你有两个以上的课程，情况会更糟。

%matplotlib inline 
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.15,  stratify=Y)
clf = make_pipeline(StandardScaler(), LogisticRegression())
clf.fit(x_train, y_train)
import matplotlib.pyplot as plt
plt.scatter(clf.named_steps['standardscaler'].transform(x_train),y_train)
plt.scatter(clf.named_steps['standardscaler'].transform(x_test),y_test)
print(clf.score(x_test,y_test))

您的更大数据集的关系也是相同的。

尝试识别其他功能，这可以帮助您预测Y.

Answer 2

您是否尝试在没有StandardScaler（）的情况下运行模型？您的数据看起来不需要重新调整。

Scikit Learn Logistic回归中的预测逆是正确的

问题描述投票：1回答：2

2个回答

最新问题

Scikit Learn Logistic回归中的预测逆是正确的

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2