如何在Scikit-Learn中对培训和测试数据进行分层？

Question

我正在尝试为Iris数据集实现分类算法（从Kaggle下载）。在“种类”列中，类别（鸢尾，鸢尾，杂色，鸢尾）按排序顺序排列。如何使用Scikit-Learn对训练和测试数据进行分层？

Answer 1

sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True)

其中X是您的数据，y是相应的标签，

test_size是应保留以进行测试的数据的百分比，

shuffle = True在拆分之前对数据进行混洗

为了确保根据一列平均分割数据，可以将其赋予stratify参数。
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True, stratify = X['YOUR_COLUMN_LABEL'])

Answer 2

train_test_split函数的

stratify参数。

from sklearn.model_selection import train_test_split X_train, y_train, X_test, y_test = train_test_split(X, y, stratify = y)这将确保所有类的比率保持相等。

Answer 3

sklearn.model_selection.train_test_split并使用

Shuffle参数。

shuffle：布尔值，可选（默认= True）拆分前是否对数据进行混洗。如果shuffle = False，则分层必须为None。

如何在Scikit-Learn中对培训和测试数据进行分层？

问题描述投票：1回答：3

3个回答

最新问题

如何在Scikit-Learn中对培训和测试数据进行分层？

问题描述 投票：1回答：3

3个回答

最新问题

问题描述投票：1回答：3