比方说,我有一个数据帧,看起来像这样:
Categories Values
0 Category 0 1
1 Category 0 0
2 Category 0 -1
3 Category 0 0
4 Category 1 1
5 Category 1 0
6 Category 1 -1
7 Category 1 0
8 Category 2 1
9 Category 2 0
10 Category 2 -1
11 Category 2 0
12 Category 3 -1
13 Category 3 0
14 Category 3 0
15 Category 3 1
16 Category 4 -1
17 Category 4 0
18 Category 4 0
19 Category 4 1
20 Category 5 -1
21 Category 5 0
22 Category 5 0
23 Category 5 1
我想时间有效的方式来获得各组的值的最后一个非零项的两件事情:
(1):索引,
(2):中的条目
的期望输出(1)为:[2,6,10,15,19,23-]在熊猫系列的形式
的期望输出(2)为:[-1,-1,-1,1,1,1]在熊猫系列的形式
预先感谢您的家伙
编辑:添加的Python代码,用于产生上述数据帧:
import pandas as pd
n = 4
m = 3
df = pd.DataFrame({'Categories': [f'Category {i//n}' for i in range(2*m*n)],
'Values' : [1,0,-1,0]*m+ [-1,0,0,1]*m})
经柱boolean indexing
与只保留最后的欺骗使用0
用于过滤器只与DataFrame.drop_duplicates
不等于Categories
值:
df1 = df[df['Values'].ne(0)].drop_duplicates('Categories', 'last')
print (df1)
Categories Values
2 Category 0 -1
6 Category 1 -1
10 Category 2 -1
15 Category 3 1
19 Category 4 1
23 Category 5 1
print (df1.index.tolist())
[2, 6, 10, 15, 19, 23]
print (df1['Values'].tolist())
[-1, -1, -1, 1, 1, 1]
解决这个的一种方式,
df['value']=df.groupby('Categories')['Values'].transform(lambda x: x.loc[x[::-1].ne(0).argmax()])
df['index']=df.groupby('Categories')['Values'].transform(lambda x: x[::-1].ne(0).argmax())
注:也许这不是解决这个的有效途径,但是我想这个简单的为您解决。
O / P:
Categories Values value index
0 Category 0 1 -1 2
1 Category 0 0 -1 2
2 Category 0 -1 -1 2
3 Category 0 0 -1 2
4 Category 1 1 -1 6
5 Category 1 0 -1 6
6 Category 1 -1 -1 6
7 Category 1 0 -1 6
8 Category 2 1 -1 10
9 Category 2 0 -1 10
10 Category 2 -1 -1 10
11 Category 2 0 -1 10
12 Category 3 -1 1 15
13 Category 3 0 1 15
14 Category 3 0 1 15
15 Category 3 1 1 15
16 Category 4 -1 1 19
17 Category 4 0 1 19
18 Category 4 0 1 19
19 Category 4 1 1 19
20 Category 5 -1 1 23
21 Category 5 0 1 23
22 Category 5 0 1 23
23 Category 5 1 1 23
我会先过滤非零行的GROUPBY:
In [11]: df1 = df[df.Values != 0]
In [12]: df1[df1.groupby("Categories")["Values"].transform(lambda x: x == x.iloc[-1])]
Out[12]:
Categories Values
2 Category 0 -1
6 Category 1 -1
10 Category 2 -1
15 Category 3 1
19 Category 4 1
23 Category 5 1
In [13]: df1[df1.groupby("Categories")["Values"].transform(lambda x: x == x.iloc[-1])].index
Out[13]: Int64Index([2, 6, 10, 15, 19, 23], dtype='int64')