我有这样的数据帧,并且我希望所有非零值的计数每月,日期和电子邮件互动
DATE LOC EMAIL INTERATION
1/11 INDIA [email protected] 0
1/11 INDIA [email protected] 11
1/11 LONDON [email protected] 2
2/11 INDIA [email protected] 5
2/11 INDIA [email protected] 5
2/11 LONDON [email protected] 0
3/11 LONDON [email protected] 1
所以我得到的数据帧应该是这样的:
DATE LOC INTERATION
1/11 INDIA 1
1/11 LONDON 1
2/11 INDIA 2
2/11 LONDON 0
3/11 LONDON 1
提前致谢
使用groupby
与agg
和numpy.count_nonzero
:
df1 = df.groupby(['DATE','LOC'], as_index=False)['INTERATION'].agg(np.count_nonzero)
print (df1)
DATE LOC INTERATION
0 1/11 INDIA 1
1 1/11 LONDON 1
2 2/11 INDIA 2
3 2/11 LONDON 0
4 3/11 LONDON 1
另一种解决方案是创建布尔面具由不COMPRE通过ne
相等,转换为整数和骨料sum
:
df1 = (df.assign(INTERATION = df['INTERATION'].ne(0).astype(int))
.groupby(['DATE','LOC'], as_index=False)['INTERATION']
.sum())
如果列EMAIL
需要组太:
df2 = df.groupby(['DATE','LOC','EMAIL'], as_index=False)['INTERATION'].agg(np.count_nonzero)
print (df2)
DATE LOC EMAIL INTERATION
0 1/11 INDIA [email protected] 1
1 1/11 INDIA [email protected] 0
2 1/11 LONDON [email protected] 1
3 2/11 INDIA [email protected] 1
4 2/11 INDIA [email protected] 1
5 2/11 LONDON [email protected] 0
6 3/11 LONDON [email protected] 1
一个不一定有效的解决方法是将转换为bool
然后sum
。这用事实0
/ 1
相当于在计算分别False
/ True
:
res = df.groupby(['DATE', 'LOC'])['INTERATION']\
.apply(lambda x: x.astype(bool).sum()).reset_index()
print(res)
DATE LOC INTERATION
0 1/11 INDIA 1
1 1/11 LONDON 1
2 2/11 INDIA 2
3 2/11 LONDON 0
4 3/11 LONDON 1