设置 pandas DataFrame 的索引时,列数组的最后一个元素不会将项目合并/分组在一起。
假设以下测试数据:
test_data = {
"desk": ["DESK1", "DESK2", "DESK3", "DESK4", "DESK5", "DESK6", "DESK7", "DESK8", "DESK9", "DESK10"],
"phone": ["111-1111", "111-1111", "111-1111", "111-1111", "444-4444", "444-4444", "111-1111", "111-1111", "123-4567", "123-4567"],
"email": ["[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]"],
"team1": ["Adam", "xxxx", "Tiana", "", "Gina", "Gina", "Ruby", "Becca", "John", ""],
"team2": ["", "", "Dime", "", "Ed", "", "", "", "Fa", "Tim"],
}
创建了一个数据框:
import io
import pandas as pd
from django.http.response import HttpResponse
from rest_framework import status
### Create DataFrame from test_data
df = pd.DataFrame(test_data)
然后尝试写入并返回文件
### Write & return the file
with io.BytesIO() as buffer:
with pd.ExcelWriter(buffer) as writer:
df: pd.DataFrame = df
groupby_columns = ['desk', 'phone', 'email']
df.set_index(groupby_columns, inplace=True, drop=True, append=False )
df.to_excel(writer, index=True, sheet_name="Team Matrix", merge_cells=True)
return HttpReponse(
buffer.getvalue(),
headers={
"Content-Type": "application/vnd.openxmlformats-" "officedocument.spreadsheetml.sheet",
"Content-Disposition": "attachment; filename=excel-export.xlsx",
},
status=status.HTTP_201_CREATED,
)
但是我想要的是如果数据相同,则前三列(办公桌、电话、电子邮件)要合并,使用上面的代码它可以对办公桌和电话列进行合并,但电话列不会像其他两列那样分组/合并。
一种可能的解决方案,将空(
""
)值放入所需的单元格中,然后合并单元格:
这将创建一个带有空单元格的新数据框:
def fn(x):
x.loc[x.index[0] + 1 :, ["desk", "phone", "email"]] = ""
return x
empty_rows = df.loc[:, ["team1", "team2"]].eq("").all(axis=1)
groups = ((df["email"] != df["email"].shift()) | empty_rows).cumsum()
df = df.groupby(groups, group_keys=False).apply(fn)
打印:
desk phone email team1 team2
0 DESK1 111-1111 [email protected] Adam
1 xxxx
2 DESK3 111-1111 [email protected] Tiana Dime
3 DESK4 111-1111 [email protected]
4 DESK5 444-4444 [email protected] Gina Ed
5 Gina
6 DESK7 111-1111 MagicSchoolbus Ruby
7 DESK8 111-1111 [email protected] Becca
8 DESK9 123-4567 [email protected] John Fa
9 DESK10 123-4567 [email protected] Tim
此步骤将合并 Excel 中的前 3 列:
def merge_fn(g):
if len(g) == 1:
return
first, last = g.index[0] + 1, g.index[-1] + 1
worksheet.merge_range(first, 0, last, 0, g.iat[0, 0], merge_format)
worksheet.merge_range(first, 1, last, 1, g.iat[0, 1], merge_format)
worksheet.merge_range(first, 2, last, 2, g.iat[0, 2], merge_format)
writer = pd.ExcelWriter("out.xlsx", engine="xlsxwriter")
df.to_excel(writer, sheet_name="Team Matrix", index=False)
workbook = writer.book
worksheet = writer.sheets["Team Matrix"]
merge_format = workbook.add_format({"align": "left", "valign": "top", "border": 0})
df.groupby(groups, group_keys=False).apply(merge_fn)
writer.close()
创建
out.xlsx
(来自 LibreOffice 的屏幕截图):