训练测试拆分功能后访问x_train列

问题描述 投票:0回答:1

[分割数据后,我尝试进行功能排名,但是当我尝试访问X_train.columns时,获取此'numpy.ndarray'对象没有属性'columns'。

 from sklearn.model_selection import train_test_split
 y=df['DIED'].values
 x=df.drop('DIED',axis=1).values
 X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42)
 print('X_train',X_train.shape)
 print('X_test',X_test.shape)
 print('y_train',y_train.shape)
 print('y_test',y_test.shape)

 bestfeatures = SelectKBest(score_func=chi2, k="all")
 fit = bestfeatures.fit(X_train,y_train)
 dfscores = pd.DataFrame(fit.scores_)
 dfcolumns = pd.DataFrame(X_train.columns)

我知道训练测试拆分会返回一个numpy数组,但是我应该如何处理呢?

python pandas split
1个回答
0
投票

可能是这段代码清楚了:

from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd

# here i imitate your example of data 

df = pd.DataFrame(data = np.random.randint(100, size = (50,5)), columns = ['DIED']+[f'col_{i}' for i in range(4)])
df.head()

Out[1]:

        DIED    col_0   col_1   col_2   col_3
0       36      0       23      43      55
1       81      59      83      37      31
2       32      86      94      50      87
3       10      69      4       69      27
4       1       16      76      98      74

#df here is a DataFrame, with all attributes, like df.columns

y=df['DIED'].values
x=df.drop('DIED',axis=1).values   # <- here you get values, so the type of structure is array of array now (not DataFrame), so it hasn't any columns name
x

Out[2]:

array([[ 0, 23, 43, 55],
       [59, 83, 37, 31],
       [86, 94, 50, 87],
       [69,  4, 69, 27],
       [16, 76, 98, 74],
       [17, 50, 52, 31],
       [95,  4, 56, 68],
       [82, 35, 67, 76],
       .....

# now you can access to columns by index, like this:

x[:,2]    # <- gives you access to the 3rd column

Out[3]:
array([43, 37, 50, 69, 98, 52, 56, 67, 81, 64, 48, 68, 14, 41, 78, 65, 11,
       86, 80,  1, 11, 32, 93, 82, 93, 81, 63, 64, 47, 81, 79, 85, 60, 45,
       80, 21, 27, 37, 87, 31, 97, 16, 59, 91, 20, 66, 66,  3,  9, 88])

 # or you able to convert array of array back to DataFrame

pd.DataFrame(data = x, columns = df.columns[1:])

Out[4]:

    col_0   col_1   col_2   col_3
0   0       23      43      55
1   59      83      37      31
2   86      94      50      87
3   69      4       69      27
....

使用所有变量的相同方法:X_train,X_test,Y_train,Y_test

© www.soinside.com 2019 - 2024. All rights reserved.