这是输入数据框。
input = pd.DataFrame({'order': {0: '1',
1: '1',
2: '2',
3: '2',
4: '3',
5: '3'},
'start': {0: pd.Timestamp('2023-04-01 04:00:00+0000', tz='UTC'),
1: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
2: pd.Timestamp('2023-04-01 04:00:00+0000', tz='UTC'),
3: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
4: pd.Timestamp('2023-04-01 04:00:00+0000', tz='UTC'),
5: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC')},
'end': {0: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
1: pd.Timestamp('2023-06-01 04:00:00+0000', tz='UTC'),
2: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
3: pd.Timestamp('2023-06-01 04:00:00+0000', tz='UTC'),
4: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
5: pd.Timestamp('2023-06-01 04:00:00+0000', tz='UTC')},
'quant': {0: 10, 1: 10, 2: 20, 3: 30, 4: 40, 5: 50},
'price': {0: 44, 1: 44, 2: 5, 3: 6, 4: 8, 5: 8}})
我的要求是根据 order 扩大这个数据框,所以我的预期输出是
有人可以帮我吗?
首先,尽量不要使用
input
作为变量。这是Python中的预定义函数。
要扩大此数据框,请将开始和结束设置为索引。然后根据
'order'
列取消堆叠。
# Convert 'start' and 'end' to datetime
df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])
# Setting 'start' and 'end' as the index and 'order' as a column level
df.set_index(['start', 'end', 'order'], inplace=True)
# Unstack 'order' to make it a top-level column
df_unstacked = df.unstack(level='order')
# Creating multi-level columns for 'quant' and 'price' per 'order'
df_wide = df_unstacked.swaplevel(axis=1).sort_index(axis=1)
df_flat_columns = df_wide.copy()
df_flat_columns.columns = [f'{var}.{order}' if var else f'{order}' for order, var in df_flat_columns.columns]
df_flat_columns.reset_index(inplace=True)
输出:
print(df_flat_columns.to_string())
start end price.1 quant.1 price.2 quant.2 price.3 quant.3
0 2023-04-01 04:00:00+00:00 2023-05-01 04:00:00+00:00 44 10 5 20 8 40
1 2023-05-01 04:00:00+00:00 2023-06-01 04:00:00+00:00 44 10 6 30 8 50