为什么我在可视化测试集结果时遇到问题？

Question

作为初学者，我正在研究泰坦尼克号生存数据集数据在这里并对它们应用逻辑回归。清理数据后如下所示：

我想根据下面的代码可视化测试集结果，以了解幸存者和非幸存者。

from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('magenta', 'blue')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('magenta', 'blue'))(i), label = j)
plt.legend()
plt.show()

但我收到此错误：

ValueError                                Traceback (most recent call last)
Cell In[69], line 6
      3 X_set, y_set = X_test, y_test
      4 X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
      5                      np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
----> 6 plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
      7              alpha = 0.75, cmap = ListedColormap(('magenta', 'blue')))
      8 plt.xlim(X1.min(), X1.max())
      9 plt.ylim(X2.min(), X2.max())

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\linear_model\_base.py:419, in LinearClassifierMixin.predict(self, X)
    405 """
    406 Predict class labels for samples in X.
    407 
   (...)
    416     Vector containing the class labels for each sample.
    417 """
    418 xp, _ = get_namespace(X)
--> 419 scores = self.decision_function(X)
    420 if len(scores.shape) == 1:
    421     indices = xp.astype(scores > 0, int)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\linear_model\_base.py:400, in LinearClassifierMixin.decision_function(self, X)
    397 check_is_fitted(self)
    398 xp, _ = get_namespace(X)
--> 400 X = self._validate_data(X, accept_sparse="csr", reset=False)
    401 scores = safe_sparse_dot(X, self.coef_.T, dense_output=True) + self.intercept_
    402 return xp.reshape(scores, -1) if scores.shape[1] == 1 else scores

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\base.py:588, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
    585     out = X, y
    587 if not no_val_X and check_params.get("ensure_2d", True):
--> 588     self._check_n_features(X, reset=reset)
    590 return out

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\base.py:389, in BaseEstimator._check_n_features(self, X, reset)
    386     return
    388 if n_features != self.n_features_in_:
--> 389     raise ValueError(
    390         f"X has {n_features} features, but {self.__class__.__name__} "
    391         f"is expecting {self.n_features_in_} features as input."
    392     )

ValueError: X has 2 features, but LogisticRegression is expecting 6 features as input.

请问有什么帮助吗？

我想展示如下所示的内容：下图是其他数据集。

完整代码：

    import pandas as pd
    import numpy as np
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    training_set = pd.read_csv('Train_Titanic.csv')
    sns.heatmap(training_set.isnull(), yticklabels=False, cbar=False, cmap='Blues')
    #if we want to dop Cabin column from the memory we set implace=True 
    training_set.drop('Cabin', axis=1, inplace=True)
    training_set.drop(['Name','Ticket','Embarked'], axis=1, inplace=True)
    plt.figure(figsize=(15,10))
    sns.boxplot(x='Sex',y='Age',data=training_set)
    
    def Fill_Age(data):
        age = data[0]
        sex = data[1]
        
        if pd.isnull(age):        #if the data has null
            if sex is 'male':
                return 29         #This is average from the above boxplot of male
            else:                 #This means if the sex is female 
                return 25         #This is average from the above boxplot of female
        else:
            return age            #This will return the same age if it isn't null
    training_set['Age'] = training_set[['Age','Sex']].apply(Fill_Age, axis=1)
    training_set.drop('PassengerId', axis=1, inplace=True)
    male = pd.get_dummies(training_set['Sex'])
    male = pd.get_dummies(training_set['Sex'], drop_first=True)
    male = pd.get_dummies(training_set['Sex'], drop_first=True, dtype=int)
    #Let's drop Sex column from our original data
    training_set.drop('Sex', axis=1, inplace=True)
    training_set
    #Now, let's add the male column that we have created to the original dataset
    training_set = pd.concat([training_set, male], axis=1)

#Now let's take our data and assign it to X (input) and y(output)
X = training_set.drop('Survived', axis=1).values
X

y = training_set['Survived'].values
y

    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test= train_test_split(X,y, test_size = 0.2, random_state = 10)
    from sklearn.linear_model import LogisticRegression # We import class (LogisticRegression)
    classifier = LogisticRegression(random_state=0)     # We took an object from the class
    classifier.fit(X_train,y_train)      
    #We are performing our training
    y_predict = classifier.predict(X_test)

Answer 1

确保您的输入数据（特别是 X）包含您打算用于预测的所有特征。在这种情况下，它应至少包含以下列：“Pclass”、“Sex”、“Age”、“SibSp”、“Parch”和“Fare”。它应该具有与逻辑回归模型期望的相同数量的特征。如果您的模型需要 6 个特征，则需要提供具有 6 个特征的输入数据。如果您正在使用逻辑回归模型并且预计它只有 2 个特征，那么模型配置或初始化可能存在问题。确认您的输入数据已正确预处理。确保您使用适当的训练和测试数据集。如果您使用错误的数据集或数据形状不匹配，则可能会出现错误。

如果您向我提供更详细的代码或有关如何准备数据和拟合逻辑回归模型的信息，我可以更好地帮助您。

为什么我在可视化测试集结果时遇到问题？

问题描述投票：0回答：1

1个回答

最新问题

为什么我在可视化测试集结果时遇到问题？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1