更改 pandas 数据透视表 bin 范围

问题描述 投票:0回答:1

我有一个 pandas 数据透视表,我想更改 bin 范围以从 0 开始计算。

Hour_Num  (0, 12] (12, 15]    (15, 20]    (20, 24]
today_qty yesterday_qty   today_qty   yesterday_qty   today_qty   yesterday_qty   today_qty   yesterday_qty
channel_name                              
Ajio  22  68  0   55  0   53  0   32
Amazon    3   6   0   3   0   3   0   0
D2C   0   0   0   1   0   0   0   0
Flipkart  25  32  0   18  0   42  0   26
Limeroad  1   0   0   0   0   1   0   0
Meesho    3   7   0   3   0   1   0   0
Myntra    61  102 0   53  0   96  0   55
Nykaa 12  8   0   10  0   14  0   18
Snapdeal  0   0   0   0   0   0   0   1
TataCliq  3   9   0   2   0   5   0   5

我希望垃圾箱为 (0, 12] (0, 15] (0, 20] (0, 24])。我想显示从当天开始到中午 12 点、下午 3 点、晚上 8 点和午夜12点。

12: 12 点 15: 3 下午 20: 8 下午 24:午夜 12 点

这是我的代码:

df['Hour_Num'] = pd.cut(df.order_hour,[0,12,15,20,24])

pivot_df = df.pivot_table(index='channel_name', values=(['yesterday_qty','today_qty']), columns=['Hour_Num'], aggfunc=('sum')).fillna(0)

pivot_df = pivot_df.swaplevel(0,1, axis=1).sort_index(axis=1)

我很感激任何提示或解决方案。谢谢你。

python pandas dataframe pivot-table data-analysis
1个回答
0
投票

这样的东西是您正在寻找的吗?

import pandas as pd
import numpy as np

time_periods = ('(0, 12]', '(12, 15]', '(15, 20]', '(20, 24]')
quantities_days = ('today_qty', 'yesterday_qty')

columns_names = pd.MultiIndex.from_product((time_periods, quantities_days), names=('Hour_Num', 'Qty'))

channel_index = pd.Index(
    data=('Ajio', 'Amazon', 'D2C', 'Flipkart', 'Limeroad', 'Meesho', 'Myntra', 'Nykaa', 'Snapdeal', 'TataCliq'),
    name='channel_index'
)

values = np.array(
    [
        [22,  68, 0, 55, 0, 53, 0, 32],
        [ 3,   6, 0,  3, 0,  3, 0,  0],
        [ 0,   0, 0,  1, 0,  0, 0,  0],
        [25,  32, 0, 15, 0, 42, 0, 26],
        [ 1,   0, 0,  0, 0,  1, 0,  0],
        [ 3,   7, 0,  3, 0,  1, 0,  0],
        [61, 102, 0, 53, 0, 96, 0, 55],
        [12,   8, 0, 10, 0, 14, 0, 18],
        [ 0,   0, 0,  0, 0,  0, 0,  1],
        [ 3,   0, 0,  2, 0,  5, 0,  5]
    ]
)

pivot_df = pd.DataFrame(values, index=channel_index, columns=columns_names)

print('Starting point:')
print(pivot_df)

print("\n\n")

########

time_periods_renamed = {
    '(12, 15]' : '(0, 15]', 
    '(15, 20]' : '(0, 20]', 
    '(20, 24]' : '(0, 24]'
}

df = pivot_df.T.groupby(level=1).cumsum().T.rename(columns=time_periods_renamed)

print('Result')
print(df)

我在这里所做的是:

  1. 创建一个
    pivot_df
    数据框,看起来就像您已经显示的那样
  2. 转置该数据框,使其行变成列,反之亦然
  3. today_qty
    yesterday_qty
     所在的级别将这些行分成组
  4. 对每组应用
  5. 累积总和 重新转置结果,使其看起来与我们的起点相似
  6. 将列重命名为新值
© www.soinside.com 2019 - 2024. All rights reserved.