X 有 29 个特征,但 RandomForestClassifier 期望 30 个特征作为输入

问题描述 投票:0回答:1

我正在尝试编写一个使用

RandomForestClassifier
预测乳腺癌的机器学习模型。代码如下:

from sklearn.model_selection import train_test_split
print("Shape of training set:", x_train.shape)
print("Shape of test set:", x_test.shape)

训练集的形状:(292, 30)

测试集的形状为:(91, 29)

from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X_train = ss.fit_transform(x_train)
X_test = ss.fit_transform(x_test)

RandomForestClassifier
的实例化:

from sklearn.ensemble import RandomForestClassifier
rand_clf = RandomForestClassifier(criterion = 'entropy', max_depth = 11, max_features = 'auto', min_samples_leaf = 2, min_samples_split = 3, n_estimators = 130)
rand_clf.fit(X_train, y_train)

我被困在这里:

y_pred = rand_clf.predict(X_test)

显示的错误是:

ValueError: X has 29 features, but RandomForestClassifier is expecting 30 features as input

我该如何解决这个问题?否则,

x_train
x_test
列不相等。

python machine-learning random-forest
1个回答
0
投票

问题在这里:

训练集的形状:(292, 30)

测试集的形状为:(91, 29)

训练集和测试集需要具有相同数量的特征,要么是 29 要么是 30(对于两者)

© www.soinside.com 2019 - 2024. All rights reserved.