返回一个组中所有唯一的集合

Question

问题是这样的。

假设我们有一个pandas df，可以用下面的方法生成。

month=['dec','dec','dec','jan','feb','feb','mar','mar']
category =['a','a','b','b','a','b','b','b']
sales=[1,10,2,5,12,4,3,1]

df = pd.DataFrame(list(zip(month,category,sales)), 
                   columns =['month', 'cat','sales']) 

print(df)

| month cat  sales   |
|--------------------|
| 0   dec   a      1 |
| 1   dec   a     10 |
| 2   dec   b      2 |
| 3   jan   b      5 |
| 4   feb   a     12 |
| 5   feb   b      4 |
| 6   mar   b      3 |
| 7   mar   b      1 |

然后我们假设我们想要按月统计每个类别的数据。

所以我们去做这样的事情

df=df.groupby(['month','cat']).sales.sum().reset_index()
print(df)
|  month cat  sales  |
|--------------------|
| 0   dec   a     11 |
| 1   dec   b      2 |
| 2   feb   a     12 |
| 3   feb   b      4 |
| 4   jan   b      5 |
| 5   mar   b      4 |

但我们希望看到的是：

|  month cat  sales  |
|--------------------|
| 0   dec   a     11 |
| 1   dec   b      2 |
| 2   feb   a     12 |
| 3   feb   b      4 |
| 4   jan   b      5 |
| 5   jan   a      0 |
| 6   mar   b      4 |
| 7   mar   a      0 |

不同的是，在某个月没有出现的类别仍然会以零作为他们的总数出现

这可能以前有人问过，但我找不到。如果你给我指出问题的方向，我们就先删除这个问题。

Answer 1

从你停止的地方继续，一个组合为一堆和拆垛会给你所需的输出。

res = (df.groupby(['month','cat'])
       .sales
       .sum()
       #unstack and fill value for the null column
       .unstack(fill_value=0)
       #return to groupby form and reset
       .stack()
       .reset_index(name='sales')
      )

res

  month cat sales
0   dec a   11
1   dec b   2
2   feb a   12
3   feb b   4
4   jan a   0
5   jan b   5
6   mar a   0
7   mar b   4

Answer 2

使用 MultiIndex 与 reindex 作为。

df=(
    df.groupby(['month','cat']).sales.sum()
    .reindex(pd.MultiIndex.from_product([df.month.unique(), df.cat.unique()], 
                                   names=['month', 'cat']), fill_value=0)
    .reset_index()
)

print(df)
  month cat  sales
0   dec   a     11
1   dec   b      2
2   feb   a     12
3   feb   b      4
4   jan   a      0
5   jan   b      5
6   mar   a      0
7   mar   b      4

Answer 3

另一种方式没有 groupby 但随着 pivot_table 和 stack:

df_ = df.pivot_table(index='month',columns='cat', 
                     values='sales', aggfunc=sum, fill_value=0)\
        .stack().reset_index()
print (df_)
  month cat   0
0   dec   a  11
1   dec   b   2
2   feb   a  12
3   feb   b   4
4   jan   a   0
5   jan   b   5
6   mar   a   0
7   mar   b   4

返回一个组中所有唯一的集合

问题描述投票：2回答：1

1个回答

最新问题

返回一个组中所有唯一的集合

问题描述 投票：2回答：1

1个回答

最新问题

问题描述投票：2回答：1