我将 PCA 应用于形状为 TxN 的时间序列 我想使用载荷重新计算第一台 PC,并将其与原始 PC 进行比较。 到目前为止我已经尝试过这个
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# Sample input data (replace this with \ actual data)
data = pd.DataFrame(np.random.rand(100, 20), columns=[f'Feature_{i}' for i in range(1, 21)])
# Standardize the data
scaler = StandardScaler()
data_standardized = scaler.fit_transform(data)
# Perform PCA with 10 components
pca = PCA(n_components=10)
pca.fit(data_standardized)
# Get loadings for the first principal component
loadings_first_component = pca.components_[0]
# Extract the square root of the explained variance for the first component
explained_variance_sqrt = np.sqrt(pca.explained_variance_[0])
# Scale the loadings by the square root of explained variance
loadings_first_component_scaled = loadings_first_component * explained_variance_sqrt
# Extract the first principal component
first_pc_original = pca.transform(data_standardized)[:, 0]
# Recompute the first principal component using loadings and input data
first_pc_recomputed = np.dot(data_standardized, loadings_first_component_scaled)
# Check if the recomputed first principal component is equal to the original
is_equal = np.allclose(first_pc_original, first_pc_recomputed)
print("Original First Principal Component:")
print(first_pc_original)
print("\nRecomputed First Principal Component:")
print(first_pc_recomputed)
print("\nAre they equal?", is_equal)
但是原来的PC和重新计算的PC不一样? 我所做的最终目标是找到线性组合中使用的权重来计算第一台 PC。最初我以为 pca.components_[0] 是权重,但我认为这是错误的
您错误地将标准化数据投影到缩放载荷上,而实际上您想要将数据投影到未缩放载荷上。
更换
first_pc_recomputed = np.dot(data_standardized, loadings_first_component_scaled)
与
first_pc_recomputed = np.dot(data_standardized, loadings_first_component)
PCA 将数据 (X) 分解为主成分轴 (A) 和主成分分数 (S):
X = SA.T
其中 A 是正交矩阵,.T 表示其转置。您感兴趣的是:
S = XA
在
sklearn
中结果如下:
pca.components_
-> Apca.transform()
-> S因此,为了获得 S,只需使用 xr.dot(X, A)
将 X 投影到
A上即可。