根据条件拖放组中的最后一行

问题描述 投票:0回答:2

我想根据条件在组中删除最后一行。我已经完成以下工作:

df=pd.read_csv('file')
grp = df.groupby('id')
for idx, i in grp:
   df= df[df['column2'].index[-1] == 'In']

     id     product   date
 0   220    in      2014-09-01 
 1   220    out     2014-09-03 
 2   220    in      2014-10-16
 3   826    in     2014-11-11
 4   826    out     2014-12-09
 5   826    out      2014-05-19
 6   901    in      2014-09-01
 7   901    out     2014-10-05
 8   901    out     2014-11-01

当我这样做时,我只会得到:KeyError:False

我想要的输出是:

     id     product   date
 0   220    in      2014-09-01 
 1   220    out     2014-09-03
 3   826    in     2014-11-11
 4   826    out     2014-12-09 
 6   901    in      2014-09-01
 7   901    out     2014-10-05
python pandas dataframe boolean rows
2个回答
1
投票

一种简单的方法是在打开.csv文件时添加skipfooter=1

df = pd.read_csv(file, skipfooter=1, engine='python')

0
投票

如果只想删除最后一个in

df = df[~df['id'].duplicated() | df['product'].ne('in')]
print (df)
    id product        date
0  220      in  2014-09-01
1  220     out  2014-09-03
3  826      in  2014-11-11
4  826     out  2014-12-09
5  826     out  2014-05-19
6  901      in  2014-09-01
7  901     out  2014-10-05
8  901     out  2014-11-01

根据您的预期输出需求:

s = df.groupby('id').cumcount()
df = df[(s.eq(0) & df['product'].eq('in')) | 
        (s.eq(1) & df['product'].eq('out'))]
print (df)
    id product        date
0  220      in  2014-09-01
1  220     out  2014-09-03
3  826      in  2014-11-11
4  826     out  2014-12-09
6  901      in  2014-09-01
7  901     out  2014-10-05
© www.soinside.com 2019 - 2024. All rights reserved.