如何创建Python的在PCA相关矩阵?

问题描述 投票:1回答:1

如何创建Python的在PCA相关矩阵?下面,我创建通过pca.components_的特征向量载荷的数据帧,但我不知道如何创建实际相关矩阵(即这些负载是如何相关与主成分)。任何线索?

此外,我已经意识到,许多这些特征向量负荷在Python负。我试图复制在Stata进行了研究,并curiosuly似乎Python的负荷为负时,Stata的相关性是正的(请参阅下面,我尝试在Python复制连接相关矩阵图像)。这只是东西,我已经注意到了 - 这到底是怎么回事呢?

Stata-Created Correlation Matrix

提前致谢。

import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
from dateutil.relativedelta import relativedelta
import blpinterface.blp_interface as blp
from scipy.stats import zscore
from sklearn.decomposition import PCA

#Set dates for analysis
startDate = "20000101"

#Construct tickers for analysis
tickers = ["USGG2YR Index", "USGG5YR Index", "USGG10YR Index", "USGG30YR Index", "USGGT10Y Index", ".30YREAL Index",
       "USGGBE10 Index", "USGGBE30 Index", ".RATEVOL1 Index", ".RATEVOL2 Index", "SPX Index", "S5INDU Index", "S5CONS Index", "VIX Index",
       ".DMFX Index", ".EMFX Index", "CL1 Comdty", "HG1 Comdty", "XAU Curncy"]

#Begin dataframe construction
mgr = blp.BLPInterface()

df = mgr.historicalRequest(tickers, "PX_LAST", startDate, "20160317")
df = df.dropna()
df = df.apply(zscore)

#Conduct PCA analysis
pca=PCA(n_components=3)
pca.fit(df) #Estimates the eigenvectors of the dataframe with 18x variables for data dating back to 2000
print(pd.DataFrame(pca.components_, columns=tickersclean, index=["PC1", "PC2", "PC3"]).transpose()) #Eigenvectors with loadings, sorted from highest explained variance to lowest
print(pca.explained_variance_) #Eigenvalues (sum of squares of the distance between the projected data points and the origin along the eigenvector)
print(pca.explained_variance_ratio_) #Explained variance ratio (i.e. how much of the change in the variables in the time series is explained by change in the respective principal component); eigenvalue/(n variables)

#Project data onto the above loadings for each row in the time series
outputpca = pd.DataFrame(pca.transform(df), columns=['PCA%i' % i for i in range(3)], index=df.index)
outputpca.columns = ["PC1", "PC2", "PC3"]
print(outputpca) #Principal component time series, projecting the data onto the above loadings; this is the sum product of the data and the eigenvector loadings for all three PCs for each row
outputpca.plot(title="Principal Components")
plt.show()
python matrix correlation pca
1个回答
0
投票

您可以在numpy模块使用的相关性存在。例:

cor_mat1 = np.corrcoef(X_std.T)
eig_vals, eig_vecs = np.linalg.eig(cor_mat1)
print('Eigenvectors \n%s' %eig_vecs)
print('\nEigenvalues \n%s' %eig_vals)

link呈现PCA使用相关矩阵的应用。

© www.soinside.com 2019 - 2024. All rights reserved.