使用 MultiIndex 分配给 Pandas DataFrame 的整个列?

问题描述 投票:0回答:1

我有一个带有MultiIndex的DataFrame(称为

midx_df
),我想将另一个具有单级索引的DataFrame(称为
sour_df
)的整列的值分配给
midx_df

sour_df
的所有索引值都存在于
midx_df
的顶级索引中,我需要指定一级索引来添加/修改具有相同一级索引的行的所有值。

例如:

beg_min = pd.to_datetime('2023/03/18 18:50', yearfirst=True)
end_min = pd.to_datetime('2023/03/18 18:53', yearfirst=True)
minutes = pd.date_range(start=beg_min, end=end_min, freq='1min')
actions = ['Buy', 'Sell']
m_index = pd.MultiIndex.from_product([minutes, actions], names=['time', 'action'])
sour_df = pd.DataFrame(index=minutes, columns=['price'])
sour_df.index.rename('time', inplace=True)
sour_df.loc[minutes[0], 'price'] = 'b0'
sour_df.loc[minutes[1], 'price'] = 'b1'
sour_df.loc[minutes[3], 'price'] = 'b2'

midx_df = pd.DataFrame(index=m_index, columns=['price'])
print(midx_df)

midx_df.loc[(beg_min, 'Buy'), 'price'] = 123    # works but only for one row!
midx_df.loc[(end_min, 'Buy')]['price'] = 124    # doesn't work!
print(midx_df)

midx_df.loc[(slice(None), 'Buy'), 'price'] = sour_df    # doesn't work!
print(midx_df)

midx_df.loc[(slice(None), 'Buy'), 'price'] = sour_df['price']    # doesn't work!
print(midx_df)

#midx_df.loc[(slice(None), 'Buy')]['price'] = sour_df['price']    # doesn't work!
#print(midx_df)

midx_df.loc[pd.IndexSlice[:, 'Buy'], :] = sour_df    # doesn't work!
print(midx_df)

请告诉我正确的方法,谢谢!!!

python pandas dataframe slice multi-index
1个回答
2
投票

这是一个有趣的问题。这里的问题是你的索引没有对齐:

('time', 'action')
vs
'time'
只是这样 pandas 无法设置正确的值。

你必须重用

midx_df
的索引来重新索引
sour_df
pd.concat
可以用来完成这个任务:

midx_df.loc[(slice(None), 'Buy'), 'price'] = \
    pd.concat([sour_df], keys=['Buy'], names=['action']).swaplevel()
print(midx_df)

# Output
                           price
time                action      
2023-03-18 18:50:00 Buy       b0
                    Sell     NaN
2023-03-18 18:51:00 Buy       b1
                    Sell     NaN
2023-03-18 18:52:00 Buy      NaN
                    Sell     NaN
2023-03-18 18:53:00 Buy       b2
                    Sell     NaN

或使用

pd.MultiIndex.from_product

midx_df.loc[(slice(None), 'Buy'), 'price'] = \
    sour_df.set_index(pd.MultiIndex.from_product([sour_df.index, ['Buy']]))

详情:

>>> midx_df.loc[(slice(None), 'Buy'), 'price']
time                 action
2023-03-18 18:50:00  Buy       NaN
2023-03-18 18:51:00  Buy       NaN
2023-03-18 18:52:00  Buy       NaN
2023-03-18 18:53:00  Buy       NaN
Name: price, dtype: object

>>> pd.concat([sour_df], keys=['Buy'], names=['action']).swaplevel()
                           price
time                action      
2023-03-18 18:50:00 Buy       b0
2023-03-18 18:51:00 Buy       b1
2023-03-18 18:52:00 Buy      NaN
2023-03-18 18:53:00 Buy       b2

>>> sour_df.set_index(pd.MultiIndex.from_product([sour_df.index, ['Buy']]))
                        price
time                         
2023-03-18 18:50:00 Buy    b0
2023-03-18 18:51:00 Buy    b1
2023-03-18 18:52:00 Buy   NaN
2023-03-18 18:53:00 Buy    b2

现在索引与设定值很好地对齐。

© www.soinside.com 2019 - 2024. All rights reserved.