我有第一个数据框:
df1 = pd.DataFrame({'subject':[2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3],
'trial' :[2,12,13,14,15,16,17,18,3,4,5,9,10,11,12,15],
'diff_rows':['nan',10,1,1,1,1,1,1,'nan',1,1,4,1,1,1,3]})
print(df1)
subject trial diff_rows
0 2 2 nan
1 2 12 10
2 2 13 1
3 2 14 1
4 2 15 1
5 2 16 1
6 2 17 1
7 2 18 1
8 3 3 nan
9 3 4 1
10 3 5 1
11 3 9 4
12 3 10 1
13 3 11 1
14 3 12 1
15 3 15 3
我一直在尝试几个选项,但由于Series没有.nth对象,所以似乎没有用
s = df1.groupby(['subject']).apply(lambda frame: frame.nth(1) if frame.diff_rows.nth(1).gt(1) else frame.nth(2))
s = df1.loc[df1.groupby(['subject']).apply(lambda frame: frame.nth(1) if frame.diff_rows.nth(1).gt(1) else frame.nth(2)), ('subject', 'trial')].to_dict(orient='record')
我的预期输出是:
subject trial diff_rows
0 2 12 10
1 3 5 1
根据您的逻辑,我们可以利用以下事实:在组的第一行中diff_rows
是NaN
:
s1 = df1.diff_rows.eq(1).where(df1.diff_rows.isna().shift())
df1.loc[s1.eq(0) | s1.shift().eq(1)]
输出:
subject trial diff_rows
1 2 12 10.0
10 3 5 1.0