Python-Django 设置 Pandas DataFrame 的多重索引不会分组/合并最后一个索引

Question

设置 pandas DataFrame 的索引时，列数组的最后一个元素不会将项目合并/分组在一起。

假设以下测试数据：

test_data = {
    "desk": ["DESK1", "DESK2", "DESK3", "DESK4", "DESK5", "DESK6", "DESK7", "DESK8", "DESK9", "DESK10"],
    "phone": ["111-1111", "111-1111", "111-1111", "111-1111", "444-4444", "444-4444", "111-1111", "111-1111", "123-4567", "123-4567"],
    "email": ["[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]"],
    "team1": ["Adam", "xxxx", "Tiana", "", "Gina", "Gina", "Ruby", "Becca", "John", ""],
    "team2": ["", "", "Dime", "", "Ed", "", "", "", "Fa", "Tim"],
}

创建了一个数据框：

import io
import pandas as pd
from django.http.response import HttpResponse
from rest_framework import status

### Create DataFrame from test_data
df = pd.DataFrame(test_data)

然后尝试写入并返回文件

### Write & return the file
with io.BytesIO() as buffer:
    with pd.ExcelWriter(buffer) as writer:
        df: pd.DataFrame = df
        groupby_columns = ['desk', 'phone', 'email']
        df.set_index(groupby_columns, inplace=True, drop=True, append=False )
        df.to_excel(writer, index=True, sheet_name="Team Matrix", merge_cells=True)

        return HttpReponse(
            buffer.getvalue(),
            headers={
                "Content-Type": "application/vnd.openxmlformats-" "officedocument.spreadsheetml.sheet",
                "Content-Disposition": "attachment; filename=excel-export.xlsx",
            },
            status=status.HTTP_201_CREATED,
        )

返回以下文件：

但是我想要的是如果数据相同，则前三列（办公桌、电话、电子邮件）要合并，使用上面的代码它可以对办公桌和电话列进行合并，但电话列不会像其他两列那样分组/合并。

Answer 1

一种可能的解决方案，将空（

""

）值放入所需的单元格中，然后合并单元格：

这将创建一个带有空单元格的新数据框：

def fn(x):
    x.loc[x.index[0] + 1 :, ["desk", "phone", "email"]] = ""
    return x


empty_rows = df.loc[:, ["team1", "team2"]].eq("").all(axis=1)
groups = ((df["email"] != df["email"].shift()) | empty_rows).cumsum()
df = df.groupby(groups, group_keys=False).apply(fn)

打印：

     desk     phone                  email  team1 team2
0   DESK1  111-1111  [email protected]   Adam      
1                                            xxxx      
2   DESK3  111-1111  [email protected]  Tiana  Dime
3   DESK4  111-1111  [email protected]             
4   DESK5  444-4444     [email protected]   Gina    Ed
5                                            Gina      
6   DESK7  111-1111         MagicSchoolbus   Ruby      
7   DESK8  111-1111        [email protected]  Becca      
8   DESK9  123-4567        [email protected]   John    Fa
9  DESK10  123-4567       [email protected]          Tim

此步骤将合并 Excel 中的前 3 列：

def merge_fn(g):
    if len(g) == 1:
        return
    first, last = g.index[0] + 1, g.index[-1] + 1
    worksheet.merge_range(first, 0, last, 0, g.iat[0, 0], merge_format)
    worksheet.merge_range(first, 1, last, 1, g.iat[0, 1], merge_format)
    worksheet.merge_range(first, 2, last, 2, g.iat[0, 2], merge_format)


writer = pd.ExcelWriter("out.xlsx", engine="xlsxwriter")

df.to_excel(writer, sheet_name="Team Matrix", index=False)
workbook = writer.book
worksheet = writer.sheets["Team Matrix"]
merge_format = workbook.add_format({"align": "left", "valign": "top", "border": 0})
df.groupby(groups, group_keys=False).apply(merge_fn)

writer.close()

创建

out.xlsx

（来自 LibreOffice 的屏幕截图）：

Python-Django 设置 Pandas DataFrame 的多重索引不会分组/合并最后一个索引

问题描述投票：0回答：1

1个回答

最新问题

Python-Django 设置 Pandas DataFrame 的多重索引不会分组/合并最后一个索引

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1