使用 SelectKBest 从 3D numpy 数组中自动选择特征

问题描述 投票:0回答:2

我是机器学习新手,正在处理一个非常复杂的问题。我有一个名为“psd_data”的 3D numpy 数组,其中包含来自进行运动想象试验的人类受试者的脑电图数据。该数组的大小为 (240, 16, 129),代表(试验、通道、PSD 特征)。我还有一个名为 labels 的一维 numpy 数组,其中包含每个试验的标签,大小为 (240,)。

我需要自动执行特征选择,然后进行分类,到目前为止我在特征选择方面遇到了麻烦。我试过这个:

from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

X = psd_data  #independent columns
y = labels    #target - SelectKBest class to extract top 15 best features
bestfeatures = SelectKBest(score_func=chi2, k=15)
fit = bestfeatures.fit(X,y)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X.columns)
#concat two dataframes for better visualization 
featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Specs','Score']  #naming the dataframe columns
print(featureScores.nlargest(15,'Score'))  #print 15 best features

但我收到错误:

ValueError: Found array with dim 3. Estimator expected <= 2.

对于如何正确操作 3D 数组“psd_data”以获得有用的结果,您有什么建议吗?

python feature-selection array-broadcasting
2个回答
0
投票

对我有用的内容如下:

#Reduce features
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import mutual_info_classif
topK = 20
SKB = SelectKBest(mutual_info_classif, k=topK) 

num_instances, num_time_steps, num_features = train_data.shape
train_data = np.reshape(train_data, newshape=[-1, num_features])
new_trClass = np.resize(y_train, num_time_steps*num_instances)
train_data_skb = SKB.fit_transform(train_data, new_trClass)
train_data_skb = np.reshape(train_data_skb, newshape=(num_instances, num_time_steps, topK))

num_instances, num_time_steps, num_features = test_data.shape
test_data = np.reshape(test_data, newshape=(-1, num_features))
test_data_skb = SKB.transform(test_data)
test_data_skb = np.reshape(test_data_skb, newshape=(num_instances, num_time_steps, topK))

feat_indices = SKB.get_support()

简单来说,您需要重塑数组以匹配二维。 希望这有帮助。


0
投票

请问您找到有用的东西了吗?

© www.soinside.com 2019 - 2024. All rights reserved.