我正在尝试使用随机森林方法找到最佳功能集我需要将数据集分为测试和训练。这是我的代码
from sklearn.model_selection import train_test_split
def train_test_split(x,y):
# split data train 70 % and test 30 %
x_train, x_test, y_train, y_test = train_test_split(x, y,train_size=0.3,random_state=42)
#normalization
x_train_N = (x_train-x_train.mean())/(x_train.max()-x_train.min())
x_test_N = (x_test-x_test.mean())/(x_test.max()-x_test.min())
train_test_split(data,data_y)
参数data,data_y正在正确解析。但我收到以下错误。我不知道为什么会这样。
您在代码中使用的功能名称与sklearn.preprocessing中的功能名称相同,更改功能名称即可。像这样的东西,
from sklearn.model_selection import train_test_split
def my_train_test_split(x,y):
# split data train 70 % and test 30 %
x_train, x_test, y_train, y_test = train_test_split(x,y,train_size=0.3,random_state=42)
#normalization
x_train_N = (x_train-x_train.mean())/(x_train.max()-x_train.min())
x_test_N = (x_test-x_test.mean())/(x_test.max()-x_test.min())
my_train_test_split(data,data_y)
说明:-尽管python中有方法重载(即,根据参数类型选择相同的命名函数),但在您的情况下,这两个函数都需要相同类型的参数,因此唯一的命名方式是不同的IMO的可能解决方案。
from sklearn.model_selection import train_test_split as sklearn_train_test_split