假设我观察一组投资组合中每只股票的权重。我想计算“主动分享”指标。定义为active_share_{ij} = active_share_{ji} = \sum^N_k | \omega^k_i - \omega^k_j |
,其中N
是市场上所有股票的数量,\omega^k_i
是投资组合k
中股票i
的权重,竖线表示绝对差。请注意,总和遍及市场中的所有股票:例如,如果投资组合i
持有股票k
而投资组合j
没有,则权重\omega^k_j=0
和绝对差应为| \omega^k_i - 0 |
。
[在我的数据结构中,我观察到了三个相关的信息portfolio
,stock
和weight
。在我的MWE中,这看起来像
portfolio stock weight
0 P1 stock1 0.3
1 P1 stock2 0.6
2 P1 stock3 0.1
3 P2 stock2 0.1
4 P2 stock3 0.2
5 P2 stock4 0.3
6 P2 stock5 0.1
7 P2 stock8 0.3
8 P3 stock3 0.4
9 P3 stock4 0.6
我想生产
active_share
portfolio_i portfolio_j
P1 P1 0.0
P2 1.6
P3 1.8
P2 P1 1.6
P2 0.0
P3 1.0
P3 P1 1.8
P2 1.0
P3 0.0
让我们以对(P1,P2)
的计算为例:|0.3-0|+|0.6-0.1|+|0.1-0.2|+|0-0.3|+|0-0.1|+|0-0.3| = 1.6
我的几行代码都能产生结果。但是,我面临着非常大的数据集,需要提高性能。
您看到一种优化计算的方法吗?例如,利用关系是对称的事实,即(i,j)的度量与(j,i)的度量相同?
MWE:
# generate example data
data = {
'portfolio': ['P1','P1','P1','P2','P2','P2','P2','P2','P3','P3'],
'stock': ['stock1','stock2','stock3','stock2','stock3','stock4','stock5','stock8','stock3','stock4'],
'weight':[0.3, 0.6, 0.1, 0.1, 0.2, 0.3, 0.1, 0.3, 0.4, 0.6]}
df = pd.DataFrame(data)
# help dataframe
df['key'] = df.groupby('portfolio').ngroup()
portfolio_list = df['portfolio'].unique()
groups = df['key'].unique()
index = pd.MultiIndex.from_product([portfolio_list, groups], names=['portfolio','key'])
join = pd.DataFrame(index=index, columns=[]).reset_index()
# perform some sort of outer join
composite = df.merge(join, on='key', suffixes=('_i','_j'), how='inner')
composite = composite.loc[:,composite.columns!='key']
composite.rename(columns={'weight':'weight_i'}, inplace=True)
# identify whether counterparty has same holdings
composite = composite.merge(df, left_on=['portfolio_j','stock'], right_on=['portfolio','stock'], how='left', suffixes=('_i','_j'))
composite = composite.loc[:,~composite.columns.isin(['portfolio','key'])]
composite.rename(columns={'weight':'weight_j'}, inplace=True)
# compute the sum of absolute differences for overlapping portfolio
composite['abs_weight_difference'] = (composite['weight_i'] - composite['weight_j']).abs()
result = composite.groupby(['portfolio_i','portfolio_j'])['abs_weight_difference'].sum().to_frame('sum_overlap')
# compute the sum of weights of stocks that are in portfolio i but not in j
result['sum_unique_i'] = composite.loc[composite['weight_j'].isnull()].groupby(['portfolio_i','portfolio_j'])['weight_i'].sum()
# add sum of weights of stocks (1) in portfolio overlap, (2) distinct to portfolio i, (3) distinct to portolio j
result = result.reset_index()
result = result.merge(result, left_on=['portfolio_i','portfolio_j'], right_on=['portfolio_j','portfolio_i'])
result = result.loc[:,['portfolio_i_x','portfolio_j_x','sum_overlap_x','sum_unique_i_x','sum_unique_i_y']]
result.columns = ['sum_unique_j' if col=='sum_unique_i_y' else col[:-2] for col in result.columns]
result = result.fillna(0)
result.set_index(['portfolio_i','portfolio_j'], inplace=True)
result = result.sum(axis=1).to_frame(name='active_share')
编辑:目前,在我的数据集的一小部分(作为将来的参考点)上需要5.82分钟的时间
您可以先pivot
数据,然后使用broadcast
计算实际数据:
a = df.pivot_table(index='portfolio',
columns='stock',
values='weight',
fill_value=0)
idx = a.index
a = a.to_numpy()
pd.DataFrame(np.abs(a[:,None] - a[None,:]).sum(axis=-1),
index=idx, columns=idx)
输出:
portfolio P1 P2 P3
portfolio
P1 0.0 1.6 1.8
P2 1.6 0.0 1.0
P3 1.8 1.0 0.0
从那里,您可以stack
获取问题中的预期输出:
pd.DataFrame(np.abs(a[:,None] - a[None,:]).sum(axis=-1),
index=idx, columns=idx).stack()
给予:
portfolio portfolio
P1 P1 0.0
P2 1.6
P3 1.8
P2 P1 1.6
P2 0.0
P3 1.0
P3 P1 1.8
P2 1.0
P3 0.0
dtype: float64
更新:由于您有两个大量的股票/投资组合,因此您可以选择循环到Stocks
,这将减少内存需求:
a = df.pivot_table(index='portfolio',
columns='stock',
values='weight',
fill_value=0)
idx = a.index
ret = pd.DataFrame(0, index=idx, columns=idx)
for col in a:
u = a[col].to_numpy()
ret += np.abs(u-u[:,None])
结果:
portfolio P1 P2 P3
portfolio
P1 0.0 1.6 1.8
P2 1.6 0.0 1.0
P3 1.8 1.0 0.0
似乎有一个更快速的解决方案更容易:
# generate example data
data = {
'portfolio': ['P1','P1','P1','P2','P2','P2','P2','P2','P3','P3'],
'stock': ['stock1','stock2','stock3','stock2','stock3','stock4','stock5','stock6','stock3','stock4'],
'weight':[0.3, 0.6, 0.1, 0.1, 0.2, 0.3, 0.1, 0.3, 0.4, 0.6]}
holdings = pd.DataFrame(data)
# compute absolute difference for stocks in both portfolios
stock_join = holdigs.merge(holdings, on=['stock'], suffixes=('_i','_j'), how='inner')
stock_join['difference'] = (stock_join['weight_i'] - stock_join['weight_j']).abs()
# compute the absolute difference for stocks that are distinct to portfolio i and j
result = stock_join.groupby(['portfolio_i','portfolio_j'])['weight_i','weight_j','difference'].sum()
result.columns = [f'ovlp_sum_{col}' for col in result.columns]
result['sum_distinct_i'] = 1.0-result['ovlp_sum_weight_i']
result['sum_distinct_j'] = 1.0-result['ovlp_sum_weight_j']
# add the parts together
result['active_share'] = result['sum_distinct_i'] + result['ovlp_sum_difference'] + result['sum_distinct_j']
result = result.loc[:,'active_share']
这计算为
portfolio_i portfolio_j
P1 P1 0.0
P2 1.6
P3 1.8
P2 P1 1.6
P2 0.0
P3 1.0
P3 P1 1.8
P2 1.0
P3 0.0