我是数据科学的初学者,目前正在为 IBM 员工流失数据集构建模型。我该如何解决这个错误?
# LogisticRegression
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.model_selection import train_test_split
#Copy the DataFrame
df1 = df.copy()
#Convert categorical variables to numeric
dummy_df = pd.get_dummies(df1, columns=["Attrition", "BusinessTravel", "Department", "EducationField",
"Gender", "JobRole", "OverTime", "MaritalStatus"], drop_first = True)
dummy_df = pd.concat([df1, dummy_df], axis=1)
dummy_df = dummy_df.drop(["Attrition", "BusinessTravel", "Department", "EducationField",
"Gender", "JobRole", "OverTime", "MaritalStatus"], axis=1)
dummy_df.rename({"Attrition_Yes":"Attrition", "OverTime_Yes":"OverTime"}, axis=1, inplace=True)
#Drop duplicate columns
dummy_df = dummy_df.loc[:,~dummy_df.columns.duplicated()]
X = dummy_df.drop("Attrition", axis=1).values
y = dummy_df["Attrition"].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=15, stratify=y)
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)
logreg.score(y_pred, y_test)
ValueError: Expected 2D array, got 1D array instead:
您可能发送的是
pandas series
而不是 dataframe
。而不是 df['column']
发送 df[['column']]
。如果不起作用,请提供代码。
问题在于:
X = dummy_df.drop("Attrition", axis=1).values
模型拟合和变换需要 X 的 2D 数组和 y 的 1D 数组。 提交 .values() 会将其转换为 1D。
最好离开:
X = dummy_df.drop("Attrition", axis=1)