所以我有:
import pandas as pd
d = { id': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
d = 'date':[13, 7, 6, 12, 18, 11, 17, 5, 3, 17],
'foo': ['abc','def','def','abc','klm','abc', 'klm','xyz', 'pqr', 'klm'],
'bar': ['123','456','333','123','111','123', '111', '331', '555', '111'],
'cnt': [2, 0, 0, 1, 2, 0, 0, 0, 0, 0 ]
}
df = pd.DataFrame(d)
df
id date foo bar cnt
0 0 13 abc 123 2
1 1 7 def 456 0
2 2 6 def 333 0
3 3 12 abc 123 1
4 4 18 klm 111 2
5 5 11 abc 123 0
6 6 17 klm 111 0
7 7 5 xyz 331 0
8 8 3 pqr 555 0
9 9 17 klm 111 0
归约函数,现在仅显示其参数,该参数是一系列的:
def fun(sr):
print(sr.keys())
for item in sr.iteritems():
print(item)
print('----')
按foo
和bar
分组:
df.groupby(['foo', 'bar']).date.agg([fun])
我不仅需要传递归约函数,还需要传递与date
中的foo
和bar
值匹配的行列表。然后,需要从该列表中构建一个字典,其中的键是我的groupby
中的id
-s,值是df
。这些字典应作为单独的列dates
添加到原始数据帧dicts
。
更新:我需要获得的完整示例:
df
任何想法如何通过 id date foo bar cnt dicts
0 0 13 abc 123 2 {('abc',123): [(0,13), (3,12), (5,11) }
1 1 7 def 456 0 {('def',456):[(1,7)]}
2 2 6 def 333 0 {('def',333):[(2,6)]}
3 3 12 abc 123 1 {('abc','123'): [(0,13), (3,12), (5,11) }
4 4 18 klm 111 2 {('klm',111):[(4,18),(6,17),(9,17)]}
5 5 11 abc 123 0 {('abc','123'): [(0,13), (3,12), (5,11) }
6 6 17 klm 111 0 {('klm',111):[(4,18),(6,17),(9,17)]}
7 7 5 xyz 331 0 {('xyz',331):[(7,5)]}
8 8 3 pqr 555 0 {('pqr',555):[(8,3)]}
9 9 17 klm 111 0 {('klm',111):[(4,18),(6,17),(9,17)]}
或其他方式实现?
这应该可以解决问题:
groupby
输出:
df["id_date"]=list(zip(df["id"], df["date"]))
gr=df.groupby(["foo", "bar"])
df=df.set_index(["foo", "bar"]).merge(gr["id_date"].agg(list).rename("dicts"), left_index=True, right_index=True).reset_index().drop("id_date", axis=1)
df["dicts"]=list(zip(list(zip(df["foo"], df["bar"])), df["dicts"]))
df["dicts"]=df["dicts"].map(lambda x: {x[0]: x[1]})
为了在所需列中填充相关的字典值,我们可以在 foo ... dicts
0 abc ... {('abc', '123'): [(0, 13), (3, 12), (5, 11)]}
1 abc ... {('abc', '123'): [(0, 13), (3, 12), (5, 11)]}
2 abc ... {('abc', '123'): [(0, 13), (3, 12), (5, 11)]}
3 def ... {('def', '333'): [(2, 6)]}
4 def ... {('def', '456'): [(1, 7)]}
5 klm ... {('klm', '111'): [(4, 18), (6, 17), (9, 17)]}
6 klm ... {('klm', '111'): [(4, 18), (6, 17), (9, 17)]}
7 klm ... {('klm', '111'): [(4, 18), (6, 17), (9, 17)]}
8 pqr ... {('pqr', '555'): [(8, 3)]}
9 xyz ... {('xyz', '331'): [(7, 5)]}
[10 rows x 6 columns]
上使用agg
,agg
和id
列,然后使用groupby和date
如下所示:
使用axis=1
:
agg
reindex
或:使用reindex
final = df.assign(lists = df[['id','date']].agg(tuple,1).groupby([df['foo'],df['bar']])
.agg(list).reindex(df[['foo','bar']]).to_numpy())
id date foo bar cnt lists
0 0 13 abc 123 2 [(0, 13), (3, 12), (5, 11)]
1 1 7 def 456 0 [(1, 7)]
2 2 6 def 333 0 [(2, 6)]
3 3 12 abc 123 1 [(0, 13), (3, 12), (5, 11)]
4 4 18 klm 111 2 [(4, 18), (6, 17), (9, 17)]
5 5 11 abc 123 0 [(0, 13), (3, 12), (5, 11)]
6 6 17 klm 111 0 [(4, 18), (6, 17), (9, 17)]
7 7 5 xyz 331 0 [(7, 5)]
8 8 3 pqr 555 0 [(8, 3)]
9 9 17 klm 111 0 [(4, 18), (6, 17), (9, 17)]
最简单的是使用Index.map
创建自定义函数:
Index.map