我有一个pd.DataFrame
,它是.groupby(['Product', 'Salesperson']).sum()
的结果。我现在想按“产品销售量”(而不是“产品销售量”和“销售人员销售量”)对“产品”列进行排序。然后在每个产品组中,按每个销售人员的销售额排序。
这是我的开始df
:
这是我想要的答案a1
,其中带有一些注释以阐明订购过程:
下面是我的示例df
和我想要的答案a1
,其中包含一个简单的断言测试。
import pandas as pd
from pandas.util.testing import assert_frame_equal
import numpy as np
s1 = {'Product': {0: 'Soap',
1: 'Soap',
2: 'Pencil',
3: 'Paper',
4: 'Paper',
5: 'Bags',
6: 'Bags'},
'Salesperson': {0: 'Jack',
1: 'Jill',
2: 'Jill',
3: 'Jack',
4: 'Barry',
5: 'Barry',
6: 'Jack'},
'Sales': {0: 40, 1: 20, 2: 500, 3: 50, 4: 10, 5: 450, 6: 100}}
a1 = {'Product': {0: 'Bags',
1: 'Bags',
2: 'Pencil',
3: 'Paper',
4: 'Paper',
5: 'Soap',
6: 'Soap'},
'Salesperson': {0: 'Barry',
1: 'Jack',
2: 'Jill',
3: 'Jack',
4: 'Barry',
5: 'Jack',
6: 'Jill'},
'Sales': {0: 450, 1: 100, 2: 500, 3: 50, 4: 10, 5: 40, 6: 20}}
df = pd.DataFrame(s1).set_index(['Product', 'Salesperson']) # sample
a1 = pd.DataFrame(a1).set_index(['Product', 'Salesperson']) # desired answer
print(df)
print(a1)
def my_sort(df):
raise NotImplementedError
my_answer = my_sort(df)
assert_frame_equal(my_answer, a1)
您可以重置:
df['sums'] = df.groupby('Product')['Sales'].transform('sum')
print(df.sort_values(['sums', 'Sales'], ascending=False).drop('sums', axis=1))
输出:
Sales
Product Salesperson
Bags Barry 450
Jack 100
Pencil Jill 500
Paper Jack 50
Soap Jack 40
Jill 20
Paper Barry 10