我的 csv 文件中有下表:
wi_document_id | wir_rejected_by | wir_reason | wir_sys_created_on |
---|---|---|---|
Int0002277 | 代理_1 | 超时 | 3/8/2024 11:18:10 上午 |
Int0002278 | 代理_1 | 超时 | 2/26/2024 12:18:16 上午 |
Int0002279 | 代理_2 | 忙 | 3/11/2024 09:18:31 上午 |
Int0002280 | 代理_2 | 超时 | 3/18/2024 10:45:08 上午 |
Int0002281 | 代理_2 | 超时 | 2024 年 3 月 4 日 10:18:22 上午 |
Int0002282 | 代理_3 | 超时 | 3/18/2024 11:20:51 上午 |
Int0002283 | 代理_3 | 忙 | 2/29/2024 08:13:04 上午 |
Int0002284 | 代理_4 | 超时 | 3/4/2024 09:30:45 上午 |
Int0002285 | 代理_4 | 忙 | 3/12/2024 10:18:34 上午 |
我有下面的脚本来计算:
脚本:
import pandas as pd
# Load the CSV file into a DataFrame
df = pd.read_csv('Rejection Report.csv')
# Convert 'wir_sys_created_on' column to datetime
df['wir_sys_created_on'] = pd.to_datetime(df['wir_sys_created_on'])
# Extract week numbers from the datetime column starting from 1 and format with ISO week number and the date of the Monday
df['week_number'] = df['wir_sys_created_on'] - pd.to_timedelta(df['wir_sys_created_on'].dt.dayofweek, unit='d')
df['week_number'] = 'Week ' + df['week_number'].dt.strftime('%V') + ' (' + df['week_number'].dt.strftime('%Y-%m-%d') + ')'
# Group by agent, week number, and rejection reason
grouped = df.groupby(['wir_rejected_by', 'week_number', 'wir_reason'])
# Calculate rejection count by reason per week
rejection_by_reason = grouped.size().unstack(fill_value=0)
# Calculate total rejection count per week
weekly_rejection_count = df.groupby(['wir_rejected_by', 'week_number']).size().unstack(fill_value=0)
# Filter rejection counts based on reasons 'Time out' and 'Busy'
rejection_timeout = rejection_by_reason['Time out'].unstack(fill_value=0)
rejection_busy = rejection_by_reason['Busy'].unstack(fill_value=0)
# Concatenate DataFrames with a multi-level column index
df_with_multiindex = pd.concat(
[weekly_rejection_count, rejection_timeout, rejection_busy],
axis=1,
keys=['Total Rejections', 'Rejections due to Time out', 'Rejections due to Busy'],
names=['', '']
)
# Ensure weeks are ordered chronologically
df_with_multiindex = df_with_multiindex.reindex(sorted(df_with_multiindex.columns), axis=1)
# Apply some formatting
styled_df = df_with_multiindex.style.format("{:.0f}")
styled_df = styled_df.set_table_styles([
{'selector': 'th', 'props': [('text-align', 'center')]},
{'selector': 'td', 'props': [('text-align', 'center')]},
{'selector': 'caption', 'props': [('caption-side', 'bottom')]}
])
# Set the caption
styled_df = styled_df.set_caption('Rejections Report')
# Display the styled DataFrame
styled_df.set_properties(**{'border-collapse': 'collapse', 'border': '1px solid black'})
计算部分不错,但多级列标题设置错误:
拒绝原因和总拒绝标题位于周数之上,导致周数重复。
我需要表格标题如下所示,并具有列和单元格边框:
周数应位于顶级标题上,并在其下方嵌套计算列,而不要为每个计算列重复周数。
关于如何完成所需结构有什么建议吗?
计算部分不错,但多级列标题设置错误..
我会按照这种方式进行造型部分:
# to be adjusted
TCOLOR, BGCOLOR = "black", "lightcyan"
CSS = [
{
"selector": "td, th[class^='col'], "
"th[class^='row'], .index_name.level1",
"props": [
("text-align", "center"), ("width", "100px"),
("color", TCOLOR), ("background-color", BGCOLOR),
("border", "1px solid black"),
],
},
{"selector": "caption", "props": [("caption-side", "bottom")]},
]
df_styled = (
df_with_multiindex.rename_axis(
index=None, columns=("wir_rejected_by", None)
).swaplevel(axis=1).sort_index(axis=1, level=0)
.style.set_caption("Rejections Report")
.set_table_styles(CSS)
)
输出(在笔记本中):