如何在多索引熊猫上的前一行应用交集?

问题描述 投票:-1回答:1

所以这是我的Multi索引:

pd.DataFrame({'category':['A','A','A','B','B','B'],
              'row':[1,2,3,1,2,3],
              'unique':[{0,1,2},{2,3,4},{1,5,6},{0,1,2},{3,4,5},{4,5,6}],
              'new':[{0,1,2},{3,4},{5,6},{0,1,2},{3,4,5},{6}]}).set_index(['category','row'])

看起来像这样:

Category  row  unique    new      
A          1   {0,1,2}  {0,1,2}
           2   {2,3,4}    {3,4}
           3   {1,5,6}    {5,6}   

B          1   {0,1,2}  {0,1,2}
           2   {3,4,5}  {3,4,5}
           3   {4,5,6}      {6}

我正在尝试应用像A.1 ['new'] intersect A.2['unique']这样的东西

预期成绩:

Category  row  unique    new      Previous Row Returned
A          1   {0,1,2}  {0,1,2}          None
           2   {2,3,4}    {3,4}           {2}
           3   {1,5,6}    {5,6}            {}

B          1   {0,1,2}  {0,1,2}          None
           2   {3,4,5}  {3,4,5}            {}
           3   {4,5,6}      {6}         {4,5}

我该如何处理?

python pandas dataframe intersection
1个回答
1
投票

在pandas中使用没有标量应该很慢,但如果需要它:

#shift values per groups 
df['Previous Row Returned'] = df.groupby(level=0)['new'].shift()
#boolean mask - working only for not missing values
mask = df['Previous Row Returned'].notnull()
#get intersection
f = lambda x: x['unique'].intersection(x['Previous Row Returned'])
df.loc[mask, 'Previous Row Returned'] = df.loc[mask].apply(f, axis=1)
print (df)
                 unique        new Previous Row Returned
Category row                                            
A        1    {0, 1, 2}  {0, 1, 2}                   NaN
         2    {2, 3, 4}     {3, 4}                   {2}
         3    {1, 5, 6}     {5, 6}                    {}
B        1    {0, 1, 2}  {0, 1, 2}                   NaN
         2    {3, 4, 5}  {3, 4, 5}                    {}
         3    {4, 5, 6}        {6}                {4, 5}
© www.soinside.com 2019 - 2024. All rights reserved.