Pandas DataFrame 中的多级标头

问题描述 投票:0回答:1

我的 csv 文件中有下表:

wi_document_id wir_rejected_by wir_reason wir_sys_created_on
Int0002277 代理_1 超时 3/8/2024 11:18:10 上午
Int0002278 代理_1 超时 2/26/2024 12:18:16 上午
Int0002279 代理_2 3/11/2024 09:18:31 上午
Int0002280 代理_2 超时 3/18/2024 10:45:08 上午
Int0002281 代理_2 超时 2024 年 3 月 4 日 10:18:22 上午
Int0002282 代理_3 超时 3/18/2024 11:20:51 上午
Int0002283 代理_3 2/29/2024 08:13:04 上午
Int0002284 代理_4 超时 3/4/2024 09:30:45 上午
Int0002285 代理_4 3/12/2024 10:18:34 上午

我有下面的脚本来计算:

  1. 每个代理每周的“拒绝计数”。
  2. 带有原因的“拒绝计数” = 每个客服人员每周“超时”。
  3. 每个客服人员每周的“拒绝计数”(原因为“忙碌”)。

脚本:

import pandas as pd

# Load the CSV file into a DataFrame
df = pd.read_csv('Rejection Report.csv')

# Convert 'wir_sys_created_on' column to datetime
df['wir_sys_created_on'] = pd.to_datetime(df['wir_sys_created_on'])

# Extract week numbers from the datetime column starting from 1 and format with ISO week number and the date of the Monday
df['week_number'] = df['wir_sys_created_on'] - pd.to_timedelta(df['wir_sys_created_on'].dt.dayofweek, unit='d')
df['week_number'] = 'Week ' + df['week_number'].dt.strftime('%V') + ' (' + df['week_number'].dt.strftime('%Y-%m-%d') + ')'

# Group by agent, week number, and rejection reason
grouped = df.groupby(['wir_rejected_by', 'week_number', 'wir_reason'])

# Calculate rejection count by reason per week
rejection_by_reason = grouped.size().unstack(fill_value=0)

# Calculate total rejection count per week
weekly_rejection_count = df.groupby(['wir_rejected_by', 'week_number']).size().unstack(fill_value=0)

# Filter rejection counts based on reasons 'Time out' and 'Busy'
rejection_timeout = rejection_by_reason['Time out'].unstack(fill_value=0)
rejection_busy = rejection_by_reason['Busy'].unstack(fill_value=0)

# Concatenate DataFrames with a multi-level column index
df_with_multiindex = pd.concat(
    [weekly_rejection_count, rejection_timeout, rejection_busy],
    axis=1,
    keys=['Total Rejections', 'Rejections due to Time out', 'Rejections due to Busy'],
    names=['', '']
)

# Ensure weeks are ordered chronologically
df_with_multiindex = df_with_multiindex.reindex(sorted(df_with_multiindex.columns), axis=1)

# Apply some formatting
styled_df = df_with_multiindex.style.format("{:.0f}")
styled_df = styled_df.set_table_styles([
    {'selector': 'th', 'props': [('text-align', 'center')]},
    {'selector': 'td', 'props': [('text-align', 'center')]},
    {'selector': 'caption', 'props': [('caption-side', 'bottom')]}
])

# Set the caption
styled_df = styled_df.set_caption('Rejections Report')

# Display the styled DataFrame
styled_df.set_properties(**{'border-collapse': 'collapse', 'border': '1px solid black'})

计算部分不错,但多级列标题设置错误:

拒绝原因和总拒绝标题位于周数之上,导致周数重复。

我需要表格标题如下所示,并具有列和单元格边框:

周数应位于顶级标题上,并在其下方嵌套计算列,而不要为每个计算列重复周数。

关于如何完成所需结构有什么建议吗?

pandas header multi-level
1个回答
0
投票

计算部分不错,但多级列标题设置错误..

我会按照这种方式进行造型部分:

# to be adjusted
TCOLOR, BGCOLOR = "black", "lightcyan"

CSS = [
    {
        "selector": "td, th[class^='col'], "
        "th[class^='row'], .index_name.level1",
        "props": [
            ("text-align", "center"), ("width", "100px"),
            ("color", TCOLOR), ("background-color", BGCOLOR),
            ("border", "1px solid black"),
        ],
    },
    {"selector": "caption", "props": [("caption-side", "bottom")]},
]

df_styled = (
    df_with_multiindex.rename_axis(
        index=None, columns=("wir_rejected_by", None)
    ).swaplevel(axis=1).sort_index(axis=1, level=0)
    .style.set_caption("Rejections Report")
    .set_table_styles(CSS)
)

输出(在笔记本中):

© www.soinside.com 2019 - 2024. All rights reserved.