我正在尝试在跨多列的数组行上构建主要组件-
import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.decomposition import PCA
df = pd.DataFrame(np.random.randn(5, 10), columns=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
df['arr1'] = df[['a', 'b', 'c', 'd', 'e']].values.tolist()
df['arr2'] = df[['f', 'g', 'h', 'i', 'j']].values.tolist()
df['arr1'] = [preprocessing.scale(row) for row in df['arr1']]
df['arr2'] = [preprocessing.scale(row) for row in df['arr2']]
df
X = df.loc[:, 'arr1':'arr2']
pca = PCA(.95)
pca.fit(X)
pca.transform(X)
哪个给出错误-
ValueError: setting an array element with a sequence.
我尝试通过np.array(list(df.arr1))
转换arr1和arr2,但这只是保留了每个数组的第一个值。
在我的真实数据集中,每个数组中有200-300个元素。