指令train_test_split [duplicate]的y索引问题

问题描述 投票:-1回答:2

为了划分训练测试数据:

X_train, X_test, y_train, y_test = train_test_split(X, y.iloc[:,1], test_size=0.3,random_state=seed, stratify=y)

但是当我跑步时,我看到此错误:(我写了x和y的大小)

Traceback (most recent call last):
  ...
   , in <module>
   X_train, X_test, y_train, y_test = train_test_split(X, y.iloc[:,1], test_size=0.3,random_state=seed, stratify=y)
    AttributeError: 'numpy.ndarray' object has no attribute 'iloc'

EDIT:形状为:

Shape(X)= (284807, 28)
Shape(y)= (284807,)

然后我用:

X_train, X_test, y_train, y_test = train_test_split(X, y[:,1], test_size=0.3,random_state=seed, stratify=y)

但是我看到了:

IndexError:数组的索引过多

如何解决此问题?

python arrays testing shapes training-data
2个回答
0
投票

正如评论所建议,尝试将y.iloc[:,1]替换为y

X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    test_size=0.3,
                                                    random_state=seed)

编辑:如文档所建议,分层参数的大小必须为[2 * len(arrays),其中数组为Xy


-1
投票

[iloc是pandas DataFrame和Series对象的方法

要访问元素,您可以使用带有索引和切片符号的ndarray或将ndarray转换为熊猫数据帧,如下所示:>

import pandas as pd
df = pd.DataFrame(nda)
y = df.iloc[:,1].to_numpy() #convert selected series from DataFrame to ndarray

DataFrame在处理数据方面提供了极大的灵活性。由于train_test_split将数组作为参数,因此可以使用DataFrame.to_numpy将DataFrame转换为ndarray

© www.soinside.com 2019 - 2024. All rights reserved.