Python 分配协方差行明智计算

问题描述 投票:0回答:0

我正在尝试根据我拥有的数据框将协方差值分配给列。 df 是 ~400k 记录 x 30+ 列。作为 COV() 输入的两个数据系列都对齐为单个记录(约 40 万条记录)。我想将列名分配为列表,然后将操作作为数组进行。我可以用相关的均值来做到这一点,但协方差似乎难以捉摸。

此外,作为一种解决方法,我可以通过写出所有步骤以更笨拙的手动方式创建协方差,但它不是动态的。 dataframe 示例(前 5 条记录,4 个 acct 月收益和基准收益数据 - 在实际 df 中,有 12 个月的 acct 收益和 12 个月的基准收益)。然而,我已经尝试了 COV() 的各种迭代,因为两个数据集(账户回报/基准回报)都在同一条记录上;我还没有找到创建函数的好方法。

df = pd.DataFrame({'ACCT_ID':['A_12345','A_23456','A_34567','A_45678','A_56789'],
                  'Acct_m1_RoR':[-0.025, -0.035, -0.055, 0.0127, -0.065],
                  'Acct_m2_RoR':[0.025, 0.035, 0.055, 0.0127, 0.065],
                  'Acct_m3_RoR':[0.065, -0.075, -0.015, 0.0527, 0.015],
                  'Acct_m4_RoR':[-0.009, 0.015, -0.065, 0.0827, -0.025],
                  'BCHMK_m1_RoR':[-0.025, -0.035, -0.055, 0.0127, -0.065],
                  'BCHMK_m2_RoR':[-0.025, -0.035, -0.055, 0.0127, -0.065],
                  'BCHMK_m3_RoR':[-0.025, -0.035, -0.055, 0.0127, -0.065],
                  'BCHMK_m4_RoR':[-0.025, -0.035, -0.055, 0.0127, -0.065]})

 List of column headers:


  a1=['Acct_m1_RoR','Acct_m2_RoR','Acct_m3_RoR','Acct_m4_RoR','Acct_m5_RoR','Acct_m6_RoR','Acct_m7_RoR','Acct_m8_RoR','Acct_m9_RoR','Acct_m10_RoR','Acct_m11_RoR','Acct_m12_RoR']
  b1=['BCHMK_m1_RoR','BCHMK_m2_RoR','BCHMK_m3_RoR','BCHMK_m4_RoR','BCHMK_m5_RoR','BCHMK_m6_RoR','BCHMK_m7_RoR','BCHMK_m8_RoR','BCHMK_m9_RoR','BCHMK_m10_RoR','BCHMK_m11_RoR','BCHMK_m12_RoR']

df['acct_mean'] = np.mean(df[a1],axis = 1)
df['bchmk_mean'] = np.mean(df[b1], axis = 1)

半手动解决方法:

df['cov'] = (((df['Acct_m1_RoR'] - df['acct_mean']) * (df['BCHMK_m1_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m2_RoR'] - df['acct_mean']) * (df['BCHMK_m2_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m3_RoR'] - df['acct_mean']) * (df['BCHMK_m3_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m4_RoR'] - df['acct_mean']) * (df['BCHMK_m4_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m5_RoR'] - df['acct_mean']) * (df['BCHMK_m5_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m6_RoR'] - df['acct_mean']) * (df['BCHMK_m6_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m7_RoR'] - df['acct_mean']) * (df['BCHMK_m7_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m8_RoR'] - df['acct_mean']) * (df['BCHMK_m8_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m9_RoR'] - df['acct_mean']) * (df['BCHMK_m9_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m10_RoR'] - df['acct_mean']) * (df['BCHMK_m10_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m11_RoR'] - df['acct_mean']) * (df['BCHMK_m11_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m11_RoR'] - df['acct_mean']) * (df['BCHMK_m12_RoR'] - df['bchmk_mean']))) / 12
python arrays pandas statistics covariance
© www.soinside.com 2019 - 2024. All rights reserved.