我正在尝试训练一个模型来预测客户的总购买金额。
这些是进一步的步骤:
# Preprocessing
def preprocess_input(df):
df = df.copy()
#Drop User ID column
df = df.drop('User ID', axis=1)
#Binary encoding
df["Gender"] = df["Gender"].replace({"Female":0, "Male": 1})
#Split df into x and y
y = df["Purchased"]
X = df.drop(["Purchased"], axis=1)
#Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, shuffle=True, random_state=1)
#Scale x
scaler = StandardScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train), index=X_train.index, columns=X_train.columns)
X_test = pd.DataFrame(scaler.transform(X_test), index=X_test.index, columns=X_test.columns)
return X_train, X_test, y_train, y_test
X_train, X_test, y_train, y_test = preprocess_input(data)
# Training/Results
model = LogisticRegression()
model.fit(X_train, y_train)
acc = model.score(X_test, y_test)
print("Test Accuracy: {:.3f}".format(acc * 100))
但是结果是
Test Accuracy: 0.000
这里可能出了什么问题?
在您的问题陈述中,您明确提到您想要预测总购买金额,所以我认为这是一个回归问题,在这里您使用的是分类算法。
如果你的 model.score 在逻辑回归中为 0,则意味着你的模型完全错误。它预测每个实例的真实类别的相反情况。