我有不同个人的股票交易数据,想计算每个人的每日收益。
Individual Stock Date Trade
1 AAPL 2022-01-01 +1
1 AAPL 2022-01-02 +2
2 GOOG 2022-01-01 +10
2 GOOG 2022-01-02 -5
结合股价数据:
Stock Date Price
AAPL 2022-01-01 102
AAPL 2022-01-02 98
AAPL 2022-01-03 96
GOOG 2022-01-01 31
GOOG 2022-01-02 35
GOOG 2022-01-03 40
目前,我循环遍历每个日期并计算昨天头寸的损益,同时使用当天的交易更新当前头寸(以下 DataFrame 适用于 2022-01-02):
Individual Stock Pos_EOD Pos_Today Price_EOD Price_Today PnL
1 AAPL 1 3 102 98 -4
2 GOOG 10 5 31 35 40
为每个日期计算上述 DataFrame 后的最终组合 DataFrame 为:
Individual Date PnL
1 2022-02 -4
1 2022-03 -6
2 2022-02 40
2 2022-04 25
但是由于个人数量多(+100,000),股票多(+50,000),性能比较慢。直觉上,我觉得可以将整个操作“矢量化”,但我不确定如何去做。任何建议都将非常受欢迎。请参阅下面的代码进行重现。
import pandas as pd
trades = pd.DataFrame( data = [ [1, "AAPL", "2022-01-01", 1],
[1, "AAPL", "2022-01-02", 2],
[2, "GOOG", "2022-01-01", 10],
[2, "GOOG", "2022-01-02", -5]],
columns = ["Individual","Stock", "Date", "Trade"])
prices = pd.DataFrame( data = [ ["AAPL", "2022-01-01", 102],
["AAPL", "2022-01-02", 98],
["AAPL", "2022-01-03", 96],
["GOOG", "2022-01-01", 31],
["GOOG", "2022-01-02", 35],
["GOOG", "2022-01-03", 40]],
columns = ["Stock", "Date", "Price"])
dates = pd.unique(prices["Date"]) #define all days with prices
start_pos = trades[trades["Date"] == dates[0]] #initial positions
start_pos = pd.merge(start_pos, prices[prices["Date"]==dates[0]].drop(columns = "Date")) #merge with prices on the first day
start_pos = start_pos.rename(columns = {"Price": "Price_EOD", "Trade": "Pos_EOD"}) #rename columns to allow for merge again later
daily_pnl_dict = {}
for i in dates[1:]:
start_pos = pd.merge(start_pos, trades[trades["Date"]==i].drop(columns = "Date"), on = ["Individual", "Stock"], how = "outer") #merge with new trades
start_pos = pd.merge(start_pos, prices[prices["Date"]==i].drop(columns = "Date"), on = "Stock", how = "left") #merge with new prices
start_pos["PnL"] = (start_pos["Price"] - start_pos["Price_EOD"]) * start_pos["Pos_EOD"] #calculate PnL as position x price change
daily_pnl = start_pos.groupby("Individual")["pnl"].sum() #calculate pnl for each individual
daily_pnl_dict[i] = daily_pnl #save daily pnl
start_pos["Price_EOD"] = start_pos["Price"] #prepare for new loop, set today's price as yesterday's price
start_pos["Pos_EOD"] = start_pos["Pos_EOD"] + start_pos["Trade"] #update position with today's trade
start_pos = start_pos.drop(columns = ["Price", "Trade"]) #make place for new trades and prices in next iteration