我有一个看起来像这样的数据帧,其中包含来自多个交换机的price
side
和volume
参数。
df = pd.DataFrame({
'price_ex1' : [9380.59650, 9394.85206, 9397.80000],
'side_ex1' : ['bid', 'bid', 'ask'],
'size_ex1' : [0.416, 0.053, 0.023],
'price_ex2' : [9437.24045, 9487.81185, 9497.81424],
'side_ex2' : ['bid', 'bid', 'ask'],
'size_ex2' : [10.0, 556.0, 23.0]
})
df
price_ex1 side_ex1 size_ex1 price_ex2 side_ex2 size_ex2
0 9380.59650 bid 0.416 9437.24045 bid 10.0
1 9394.85206 bid 0.053 9487.81185 bid 556.0
2 9397.80000 ask 0.023 9497.81424 ask 23.0
对于每个交易所(我有两个以上的交易所),我希望指数为所有交易所(即price_ex1
,price_ex2
等的联合)的所有价格的并集,从最高到最低。然后,我想基于该交换的size
参数为每个交换创建两个side
列。输出应如下所示,其中空列为NaN
。
我不确定执行此操作的最佳pandas函数是什么,无论是透视还是融化,以及当我要压平的二进制列超过1个时如何使用该函数。
谢谢您的帮助!
您可以尝试类似的方法。
请使用您显示给我们的数据制作一个数据框,并将其命名为'example.csv'
price_ex1 side_ex1 size_ex1 price_ex2 side_ex2 size_ex2
import pandas as pd
import numpy as np
df = pd.read_csv('example.csv')
df1 = df[['price_ex1','side_ex1','size_ex1']]
df2 = df[['price_ex2','side_ex2','size_ex2']]
df3 = df1.append(df2)
df4 = df3[['price_ex1','price_ex2']]
arr = df4.values
df3['price_ex1'] = arr[~np.isnan(arr)].astype(float)
df3.drop(columns=['price_ex2'], inplace=True)
df3.columns = ['price', 'bid_ex1', 'ask_ex1', 'bid_ex2', 'ask_ex2']
def change(bid_ex1, ask_ex1, bid_ex2, ask_ex2, col_name):
if col_name == 'bid_ex1_col':
if (bid_ex1 != np.nan or bid_ex2 != np.nan) and bid_ex1 == 'bid':
return bid_ex2
else:
return bid_ex1
if col_name == 'ask_ex1_col':
if (bid_ex1 != np.nan or bid_ex2 != np.nan) and bid_ex1 == 'ask':
return bid_ex2
else:
return ask_ex1
if col_name == 'ask_ex2_col':
if (ask_ex1 != np.nan or ask_ex2 != np.nan) and ask_ex1 == 'ask':
return ask_ex2
else:
return ask_ex1
if col_name == 'bid_ex2_col':
if (ask_ex1 != np.nan or ask_ex2 != np.nan) and ask_ex1 == 'bid':
return ask_ex2
else:
return ask_ex1
df3['bid_ex1_col'] = df3.apply(lambda row: change(row['bid_ex1'],row['ask_ex1'],row['bid_ex2'],row['ask_ex2'], 'bid_ex1_col'), axis=1)
df3['ask_ex1_col'] = df3.apply(lambda row: change(row['bid_ex1'],row['ask_ex1'],row['bid_ex2'],row['ask_ex2'], 'ask_ex1_col'), axis=1)
df3['ask_ex2_col'] = df3.apply(lambda row: change(row['bid_ex1'],row['ask_ex1'],row['bid_ex2'],row['ask_ex2'], 'ask_ex2_col'), axis=1)
df3['bid_ex2_col'] = df3.apply(lambda row: change(row['bid_ex1'],row['ask_ex1'],row['bid_ex2'],row['ask_ex2'], 'bid_ex2_col'), axis=1)
df3.drop(columns=['bid_ex1', 'ask_ex1', 'bid_ex2', 'ask_ex2'], inplace=True)
df3.replace(to_replace='ask', value=np.nan,inplace=True)
df3.replace(to_replace='bid', value=np.nan,inplace=True)